You are currently browsing the monthly archive for October 2014.
Earlier this week US News and World Report (USNWR) released, for the first time, a global ranking of universities including rankings by subject area. In mathematics, the top ten universities are:
1. Berkeley
2. Stanford
3. Princeton
4. UCLA
5. University of Oxford
6. Harvard
7. King Abdulaziz University
8. Pierre and Marie Curie – Paris 6
9. University of Hong Kong
10. University of Cambridge
The past few days I’ve received a lot of email from colleagues and administrators about this ranking, and also the overall global ranking of USNWR in which Berkeley was #1. The emails generally say something to the effect of “of course rankings are not perfect, everybody knows… but look, we are amazing!”
BUT, one of the top math departments in the world, the math department at the Massachusetts Institute of Technology is ranked #11… they didn’t even make the top ten. Even more surprising is the entry at #7 that I have boldfaced: the math department at King Abdulaziz University (KAU) in Jeddah, Saudi Arabia. I’ve been in the math department at Berkeley for 15 years, and during this entire time I’ve never (to my knowledge) met a person from their math department and I don’t recall seeing a job application from any of their graduates… I honestly had never heard of the university in any scientific context. I’ve heard plenty about KAUST (the King Abdullah University of Science and Technology ) during the past few years, especially because it is the first mixed-gender university campus in Saudi Arabia, is developing a robust research program based on serious faculty hires from overseas, and in a high profile move hired former Caltech president Jean-Lou Chameau to run the school. But KAU is not KAUST.
A quick google searched reveals that although KAU is nearby in Jeddah, it is a very different type of institution. It has two separate campuses for men and women. Although it was established in 1967 (Osama Bin Laden was a student there in 1975) its math department started a Ph.D. program only two years ago. According to the math department website, the chair of the department, Prof. Abdullah Mathker Alotaibi, is a 2005 Ph.D. with zero publications [Update: Nov. 10: This initial claim was based on a Google Scholar Search of his full name; a reader commented below that he has published and that this claim was incorrect. Nevertheless, I do not believe it in any way materially affect the points made in this post.] This department beat MIT math in the USNWR global rankings! Seriously?
The USNWR rankings are based on 8 attributes:
– global research reputation
– regional research reputation
– publications
– normalized citation impact
– total citations
– number of highly cited papers
– percentage of highly cited papers
– international collaboration
Although KAU’s full time faculty are not very highly cited, it has amassed a large adjunct faculty that helped them greatly in these categories. In fact, in “normalized citation impact” KAU’s math department is the top ranked in the world. This amazing statistic is due to the fact that KAU employs (as adjunct faculty) more than a quarter of the highly cited mathematicians at Thomson Reuters. How did a single university assemble a group with such a large proportion of the world’s prolific (according to Thomson Reuters) mathematicians? (When I first heard this statistic from Iddo Friedberg via Twitter I didn’t believe it and had to go compute it myself from the data on the website. I guess I believe it now but I still can’t believe it!!)
In 2011 Yudhijit Bhattacharjee published an article in Science titled “Saudi Universities Offer Cash in Exchange for Academic Prestige” that describes how KAU is targeting highly cited professors for adjunct faculty positions. According to the article, professors are hired as adjunct professors at KAU for $72,000 per year in return for agreeing (apparently by contract) to add KAU as a secondary affiliation at ISIhighlycited.com and for adding KAU as an affiliation on their published papers. Annual visits to KAU are apparently also part of the “deal” although it is unclear from the Science article whether these actually happen regularly or not.
[UPDATE Oct 31, 12:14pm: A friend who was solicited by KAU sent me the invitation email with the contract that KAU sends to potential “Distinguished Adjunct Professors”. The details are exactly as described in the Bhattacharjee article:
From: "Dr. Mansour Almazroui" <ceccr@kau.edu.sa> Date: XXXX To: XXXX <XXXX> Subject: Re: Invitation to Join “International Affiliation Program” at King Abdulaziz University, Jeddah Saudi Arabia Dear Prof. XXXX , Hope this email finds you in good health. Thank you for your interest. Please find below the information you requested to be a “Distinguished Adjunct Professor” at KAU. 1. Joining our program will put you on an annual contract initially for one year but further renewable. However, either party can terminate its association with one month prior notice. 2. The Salary per month is $ 6000 for the period of contract. 3. You will be required to work at KAU premises for three weeks in each contract year. For this you will be accorded with expected three visits to KAU. 4. Each visit will be at least for one week long but extendable as suited for research needs. 5. Air tickets entitlement will be in Business-class and stay in Jeddah will be in a five star hotel. The KAU will cover all travel and living expenses of your visits. 6. You have to collaborate with KAU local researchers to work on KAU funded (up to $100,000.00) projects. 7. It is highly recommended to work with KAU researchers to submit an external funded project by different agencies in Saudi Arabia. 8. May submit an international patent. 9. It is expected to publish some papers in ISI journals with KAU affiliation. 10. You will be required to amend your ISI highly cited affiliation details at the ISI highlycited.com web site to include your employment and affiliation with KAU. Kindly let me know your acceptance so that the official contract may be preceded. Sincerely, Mansour
]
The publication of the Science article elicited a strong rebuttal from KAU on the comments section, where it was vociferously argued that the hiring of distinguished foreign scholars was aimed at creating legitimate research collaborations, and was not merely a gimmick for increasing citation counts. Moreover, some of the faculty who had signed on defended the decision in the article. For example, Neil Robertson, a distinguished graph theorist (of Robertson-Seymour graph minors fame) explained that “it’s just capitalism,” and “they have the capital and they want to build something out of it.” He added that “visibility is very important to them, but they also want to start a Ph.D. program in mathematics,” (they did do that in 2012) and he added that he felt that “this might be a breath of fresh air in a closed society.” It is interesting to note that despite his initial enthusiasm and optimism, Professor Robertson is no longer associated with KAU.
In light of the high math ranking of KAU in the current USNWR I decided to take a closer look at who KAU has been hiring, why, and for what purpose, i.e. I decided to conduct post-publication peer review of the Bhattacharjee Science paper. A web page at KAU lists current “Distinguished Scientists” and another page lists “Former Distinguished Adjunct Professors“. One immediate observation is that out of 118 names on these pages there is 1 woman (Cheryl Praeger from the University of Western Australia). Given that KAU has two separate campuses for men and women, it is perhaps not surprising that women are not rushing to sign on, and perhaps KAU is also not rushing to invite them (I don’t have any information one way or another, but the underrepresentation seems significant). Aside from these faculty, there is also a program aptly named the “Highly Cited Researcher Program” that is part of the Center for Excellence in Genomic Medicine Research. Fourteen faculty are listed there (all men, zero women). But guided by the Science article which described the contract requirement that researchers add KAU to their ISI affiliation, I checked for adjunct KAU faculty at Thomson-Reuters ResearcherID.com and there I found what appears to be the definitive list.
Although Neil Robertson has left KAU, he has been replaced by another distinguished graph theorist, namely Carsten Thomassen (no accident as his wikipedia page reveals that “He was included on the ISI Web of Knowledge list of the 250 most cited mathematicians.”) This is a name I immediately recognized due to my background in combinatorics; in fact I read a number of Thomassen’s papers as a graduate student. I decided to check whether it is true that adjunct faculty are adding KAU as an affiliation on their articles. Indeed, Thomassen has done exactly that in his latest publication Strongly 2-connected orientations of graphs published this year in the Journal of Combinatorial Theory Series B. At this point I started having serious reservations about the ethics of faculty who have agreed to be adjuncts at KAU. Regardless of the motivation of KAU in hiring adjunct highly cited foreign faculty, it seems highly inappropriate for a faculty member to list an affiliation on a paper to an institution to which they have no scientific connection whatsoever. I find it very hard to believe that serious graph theory is being researched at KAU, an institution that didn’t even have a Ph.D. program until 2012. It is inconceivable that Thomassen joined KAU in order to find collaborators there (he mostly publishes alone), or that he suddenly found a great urge to teach graph theory in Saudi Arabia (KAU had no Ph.D. program until 2012). The problem is also apparent when looking at the papers of researchers in genomics/computational biology that are adjuncts at KAU. I recognized a number of such faculty members, including high-profile names from my field such as Jun Wang, Manolis Dermitzakis and John Huelsenbeck. I was surprised to see their names (none of these faculty mention KAU on their websites) yet in each case I found multiple papers they have authored during the past year in which they list the KAU affiliation. I can only wonder whether their home institutions find this appropriate. Then again, maybe KAU is also paying the actual universities the faculty they are citation borrowing belong to? But assume for a moment that they aren’t, then why should institutions share the credit they deserve for supporting their faculty members by providing them space, infrastructure, staff and students with KAU? What exactly did KAU contribute to Kilpinen et al. Coordinated effects of sequence variation on DNA binding, chromatin structure and transcription, Science, 2013? Or to Landis et al. Bayesian analysis of biogeography when the number of areas is large, Systematic Biology, 2013? These papers have no authors or apparent contribution from KAU. Just the joint affiliation of the adjunct faculty member. The limit of the question arises in the case of Jun Wang, director of the Beijing Genome Institute, whose affiliations are BGI (60%), University of Copenhagen (15%), King Abdulaziz University (15%), The University of Hong Kong (5%), Macau University of Science and Technology (5%). Should he also acknowledge the airlines he flies on? Should there not be some limit on the number of affiliations of an individual? Shouldn’t journals have a policy about when it is legitimate to list a university as an affiliation for an author? (e.g. the author must have in some significant way been working at the institution).
Another, bigger, disgrace that emerged in my examination of the KAU adjunct faculty is the issue of women. Aside from the complete lack of women in the “Highly Cited Researcher Program”, I found that most of the genomics adjunct faculty hired via the program will be attending an all-male conference in three weeks. The “Third International Conference on Genomic Medicine” will be held from November 17–20th at KAU. This conference has zero women. The same meeting last year… had zero women. I cannot understand how in 2014, at a time when many are speaking out strongly about the urgency of supporting females in STEM and in particular about balancing meetings, a bunch of men are willing to forgo all considerations of gender equality for the price of ~$3 per citation per year (a rough calculation using the figure of $72,000 per year from the Bhattacharjee paper and 24,000 citations for a highly cited researcher). To be clear I have no personal knowledge about whether the people I’ve mentioned in this article are actually being paid or how much, but even if they are being paid zero it is not ok to participate in such meetings. Maybe once (you didn’t know what you are getting into), but twice?!
As for KAU, it seems clear based on the name of the “Highly Cited Researcher Program” and the fact that they advertise their rankings that they are specifically targeting highly cited researchers much more for their delivery of their citations than for development of genuine collaborations (looking at the adjunct faculty I failed to see any theme or concentration of people in any single area as would be expected in building a coherent research program). However I do not fault KAU for the goal of increasing the ranking of their institution. I can see an argument for deliberately increasing rankings in order to attract better students, which in turn can attract faculty. I do think that three years after the publication of the Science article, it is worth taking a closer look at the effects of the program (rankings have increased considerably but it is not clear that research output from individuals based at KAU has increased), and whether this is indeed the most effective way to use money to improve the quality of research institutions. The existence of KAUST lends credence to the idea that the king of Saudi Arabia is genuinely interested in developing Science in the country, and there is a legitimate research question as to how to do so with the existing resources and infrastructure. Regardless of how things ought to be done, the current KAU emphasis on rankings is a reflection of the rankings, which USNWR has jumped into with its latest worldwide ranking. The story of KAU is just evidence of a bad problem getting worse. I have previously thought about the bad version of the problem:
A few years ago I wrote a short paper with my (now former) student Peter Huggins on university rankings:
P. Huggins and L.P., Selecting universities: personal preferences and rankings, arXiv, 2008.
It exists only as an arXiv preprint as we never found a suitable venue for publication (this is code for the paper was rejected upon peer review; no one seemed interested in finding out the extent to which the data behind rankings can produce a multitude of stories). The article addresses a simple question: given that various attributes have been measured for a bunch of universities, and assuming they are combined (linearly) into a score used to produce rankings, how do the rankings depend on the weightings of the individual attributes? The mathematics is that of polyhedral geometry, where the problem is to compute a normal fan of a polytope whose vertices encode all the possible rankings that can be obtained for all possible weightings of the attributes (an object we called the unitope). An example is shown below, indicating the possible rankings as determined by weightings chosen among three attributes measured by USNWR (freshman retention, selectivity, peer assessment). It is important to keep in mind this is data from 2007-2008.
Our paper had an obvious but important message: rankings can be very sensitive to the attribute weightings. Of course some schools such as Harvard came out on top regardless of attribute preferences, but some schools, even top ranked schools, could shift by over 50 positions. Our conclusion was that although the data collected by USNWR was useful, the specific weighting chosen and the ranking it produced were not. Worse than that, sticking to a single choice of weightings was misleading at best, dangerous at worse.
I was reminded of this paper when looking at the math department rankings just published by USNWR. When I saw that KAU was #7 I was immediately suspicious, and even Berkeley’s #1 position bothered me (even though I am a faculty member in the department). I immediately guessed that they must have weighted citations heavily, because our math department has applied math faculty, and KAU has their “highly cited researcher program”. Averaging citations across faculty from different (math) disciplines is inherently unfair. In the case of Berkeley, my applied math colleague James Sethian has a paper on level set methods with more than 10,000 (Google Scholar) citations. This reflects the importance and advance of the paper, but also the huge field of users of the method (many, if not most, of the disciplines in engineering). On the other hand, my topology colleague Ian Agol’s most cited paper has just over 200 citations. This is very respectable for a mathematics paper, but even so it doesn’t come close to reflecting his true stature in the field, namely the person who settled the Virtually Haken Conjecture thereby completing a long standing program of William Thurston that resulted in many of the central open problems in mathematics (Thurston was also incidentally an adjunct faculty member at KAU for some time). In other words, not only are citations not everything, they can also be not anything. By comparing citations across math departments that are diverse to very differing degrees USNWR rendered the math ranking meaningless. Some of the other data collected, e.g. reputation, may be useful or relevant to some, and for completeness I’m including it with this post (here) in a form that allows for it to be examined properly (USNWR does not release it in the form of a table, but rather piecemeal within individual html pages on their site), but collating the data for each university into one number is problematic. In my paper with Peter Huggins we show both how to evaluate the sensitivity of rankings to weightings and also how to infer bounds on the weightings by USNWR from the rankings. It would be great if USNWR included the ability to perform such computations with their data directly on their website but there is a reason USNWR focuses on citations.
The impact factor of a journal is a measure of the average amount of citation per article. It is computed by averaging the citations over all articles published during the preceding two years, and its advertisement by journals reflects a publishing business model where demand for the journal comes from the impact factor, profit from free peer reviewing, and sales from closed subscription based access. Everyone knows the peer review system is broken, but it’s difficult to break free of when incentives are aligned to maintain it. Moreover, it leads to perverse focus of academic departments on the journals their faculty are publishing in and the citations they accumulate. Rankings such as those by USNWR reflect the emphasis on citations that originates with the journals, as so one cannot fault USNWR for including it as a factor and weighting it highly in their rankings. Having said that, USNWR should have known better than to publish the KAU math rankings; in fact it appears their publication might be a bug. The math department rankings are the only rankings that appear for KAU. They have been ommitted entirely from the global overall ranking and other departmental rankings (I wonder if this is because USNWR knows about the adjunct faculty purchase). In any case, the citation frenzy feeds departments that in aggregate form universities. Universities such as King Abdulaziz, that may reach the point where they feel compelled to enter into the market of citations to increase their overall profile…
I hope this post frightened you. It should. Happy Halloween!
[Update: Dec. 6: an article about KAU and citations has appeared in the Daily Cal, Jonathan Eisen posted his exchanges with KAU, and he has storified the tweets]
This year half of the Nobel prize in Physiology or Medicine was awarded to May-Britt Moser and Edvard Moser, who happen to be both a personal and professional couple. Interestingly, they are not the first but rather the fourth couple to win the prize jointly: In 1903 Marie Curie and Pierre Curie shared the Nobel prize in physics, in 1935 Frederic Joiliot and Irene Joliot-Curie shared the Nobel prize in chemistry and in 1947 Carl Cori and Gerty Cori also shared the Nobel prize in physiology or medicine. It seems working on science with a spouse or partner can be a formula for success. Why then, when partners apply together for academic jobs, do universities refer to them as “two body problems“?
The “two-body problem” is a question in physics about the motions of pairs of celestial bodies that interact with each other gravitationally. It is a special case of the difficult “N-body problem” but simple enough that is (completely) solved; in fact it was solved by Johann Bernoulli a few centuries ago. The use of the term in the context of academic job searches has always bothered me- it suggests that hiring in academia is an exercise in mathematical physics (it is certainly not!) and even if one presumes that it is, the term is an oxymoron because in physics the problem is solved whereas in academia it is used in a way that implies it is unsolvable. There are countless times I have heard my colleagues sigh “so and so would be great but there is a two body problem”. Semantics aside, the allusion to high brow physics problems in the process of academic hiring belies a complete lack of understanding of the basic mathematical notion of epistasis relevant in the consideration of joint applications, not to mention an undercurrent of sexism that plagues science and engineering departments everywhere. The results are poor hiring decisions, great harm to the academic prospects of partners and couples, and imposition of stress and anxiety that harms the careers of those who are lucky enough to be hired by the flawed system.
I believe it was Aristotle who first noted used the phrase “the whole is greater than the sum of its parts”. The old adage remains true today: owning a pair of matching socks is more than twice as good as having just one sock. This is called positive epistasis, or synergy. Of course the opposite may be true as well: a pair of individuals trying to squeeze through a narrow doorway together will take more than twice as long than if they would just go through one at a time. This would be negative epistasis. There is a beautiful algebra and geometry associated to positive/negative epistasis this is useful to understand, because its generalizations reveal a complexity to epistasis that is very much at play in academia.
Formally, thinking of two “parts”, we can represent them as two bit strings: 01 for one part and 10 for the other. The string 00 represents the situation of having neither part, and 11 having both parts. A “fitness function” assigns to each string a value. Epistasis is defined to be the sign of the linear form
.
That is, is positive epistasis,
is negative epistasis and
is no epistasis. In the case where
, “the whole is greater than the sum of its parts” means that
and “the whole is less than the sum of its parts” means
. There is an accompanying geometry that consists of drawing a square in the x-y plane whose corners are labeled by
and
. At each corner, the function
can be represented by a point on the z axis, as shown in the example below:
The black line dividing the square into two triangles comes about by imagining that there are poles at the corners of the square, of height equal to the fitness value, and then that a tablecloth is draped over the poles and stretched taught. The picture above then correspond to the leftmost panel below:
The crease is the resulting of projecting down onto the square the “fold” in the tablecloth (assuming there is a fold). In other words, positive and negative epistasis can be thought of as corresponding to one of the two triangulations of the square. This is the geometry of two parts but what about n parts? We can similarly represent them by n bit strings with the “whole” corresponding to
. Assuming that the parts can only be added up all together, the geometry now works out to be that of triangulations of the hyperbipyramid; the case
is shown below:
“The whole is greater than the sum of its parts”: the superior-inferior slice.
“The whole is less than the sum of its parts”: the transverse slice.
With multiple parts epistasis can become more complicated if one allows for arbitrary combining of parts. In a paper written jointly with Niko Beerenwinkel and Bernd Sturmfels titled “Epistasis and shapes of fitness landscapes“, we developed the mathematics for the general case and showed that epistasis among n objects allowed to combine in all possible ways corresponds to the different triangulations of a hypercube. For example, in the case of three objects, the square is replaced by the cube with eight corners corresponding to the eight bit strings of length 3. There are 74 triangulations of the cube, falling into 6 symmetry classes. The complete classification is shown below (for details on the meaning of the GKZ vectors and out-edges see the paper):
There is a beautiful geometry describing how the different epistatic shapes (or triangulations) are related, which is known as the secondary polytope. Its vertices correspond to the triangulations and two are connected by an edge when they are the same except for the “flip” of one pair of neighboring tetrahedra. The case of the cube is shown below:
The point of the geometry, and its connection to academic epistasis that I want to highlight in this post, is made clear when considering the case of . In that case the number of different types of epistatic interactions is given by the number of triangulations of the 4-cube. There are 87,959,448 triangulations and 235,277 symmetry types! In other words, the intuition from two parts that “interaction” can be positive, negative or neutral is difficult to generalize without math, and the point is there are a myriad of ways a faculty in a large department can be interacting both to the benefit and the detriment of their overall scientific output.
In many searches I’ve been involved in the stated principle for hiring is “let’s hire the best person”. Sometimes the search may be restricted to a field, but it is not uncommon that the search is open. Such a hiring policy deliberately ignores epistasis, and I think it’s crazy, not to mention sexist, because the policy affects and hurts women applicants far more than it does men. Not because women are less likely to be “the best” in their field, in fact quite the opposite. It is very common for women in academia to be partnered with men who are also in academia, and inevitably they suffer for that fact because departments have a hard time reconciling that both could be “the best”. There are also many reasons for departments to think epistaticially that go beyond basic fairness principles. For example, in the case of partners that are applying together to a university, even if they are not working together on research, it is likely that each one will be far more productive if the other has a stable job at the same institution. It is difficult to manage a family if one partner needs to commute hours, or in some cases days, to work. I know of a number of couples in academia that have jobs in different states.
In the last few years there are a few couples that have been bold enough to openly declare themselves “positively epistatic”. What I mean is that they apply jointly as a single applicant, or “joint lab” in the case of biology. For example, there is the case of the Altschuler-Wu lab that has recently relocated to UCSF or the Eddy-Rivas lab that is relocating to Harvard. Still, such cases are far and few between, and for the most part hiring is inefficient, clumsy and unfair (it is also worth noting that there are many other epistatic factors that can and should be considered, for example the field someone is working in, collaborators, etc.)
Epistasis has been carefully studied for a long time in population and statistical genetics, where it is fundamental in understanding the effects of genotype on phenotype. The geometry described above can be derived for diploid genomes and this was done by Ingileif Hallgrímsdóttir and Debbie Yuster in the paper “A complete classification of epistatic two-locus models” from 2008. In the paper they examine a previous classification of epistasis among 30 pairs of loci in a QTL analysis of growth traits in chicken (Carlborg et al., Genome Research 2003). The (re)-classification is shown in the figure below:
If we can classify epistasis for chickens in order to understand them, we can certainly assess the epistasis outlook for our potential colleagues, and we should hire accordingly.
It’s time that the two body problem be recognized as the two body opportunity.
This is part (2/2) about my travel this past summer to Iceland and Israel:
In my previous blog post I discussed the genetics of Icelanders, and the fact that most Icelanders can trace their roots back dozens of generations, all the way to Vikings from ca. 900AD. The country is homogenous in many other ways as well (religion, income, etc.), and therefore presents a stark contrast to the other country I visited this summer: Israel. Even though I’ve been to Israel many times since I was a child, now that I am an adult the manifold ethnic, social and religious makeup of the society is much more evident to me. This was particularly true during my visit this past summer, during which political and military turmoil in the country served to accentuate differences. There are Armenians, Ashkenazi Jews, Bahai, Bedouin, Beta Israel, Christian Arabs, Circassians, Copts, Druze, Maronites, Muslim Arab, Sephardic Jews etc. etc. etc. , and additional “diversity” caused by political splits leading to West Bank Palestinians, Gaza Palestinians, Israelis inside vs. outside the Green Line, etc. etc. etc. (and of course many individuals fall into multiple categories). It’s fair to say that “it’s complicated”. Moreover, the complex fabric that makes up Israeli society is part of a larger web of intertwined threads in the Middle East. The “Arab countries” that neighbor Israel are also internally heterogeneous and complex, both in obvious ways (e.g. the Sunni vs. Shia division), but also in many more subtle ways (e.g. language).
The 2014 Israeli-Gaza conflict started on July 8th. Having been in Israel for 4 weeks I was interacting closely with many friends and colleagues who were deeply impacted by the events (e.g. their children were suddenly called up to a partake in a war), and among them I noticed almost immediately an extreme polarization that reflected a public relations battle being waged between Hamas and Israel that played out more intensely than in any previous conflict on news channels and social media. The polarization extended to friends and acquaintances outside of Israel. Everyone had a very strong opinion. One thing I noticed were graphic memes being passed around in which the conflict was projected onto a two-colored map. For example, the map below was passed around on Facebook showing the (“real democratic”) Israel surrounded by a sea of Arab green in the Middle East:
I started noticing other bifurcating maps as other Middle East issues came to the fore later in the summer. Here is a map from a website depicting the Sunni-Shia divide:
In many cases the images being passed around were explicitly encouraging a “one-dimensional” view of the conflict(s), whereas in other cases the “us” vs. “them” factor was more subliminal. The feeling that I was being programmed how to think made me uncomfortable.
Moreover, the Middle East memes that were flooding my inbox were distracting me. I had visited Israel to nurture and establish connections and collaborations with the large number of computational biologists in the country. During my trip I was kindly hosted by Yael Mandel-Gutfreund at the Technion, and also had the honor of being an invited speaker at the annual Israeli Bioinformatics Society meeting. The visit was not supposed to be a bootcamp in salon politics. In any case, I found myself thinking about the situation in the Middle East with a computational biology mindset, and I was struck by the following “Middle East Friendship Chart” published in July that showed data about the relationships of the various entities/countries/organizations:
As a (computational) biologist I was keen to understand the data in a visual way that would reveal the connections more clearly, and as a computational (biologist) faced with ordinal data I thought immediately of non-metric multi-dimensional scaling as a way to depict the information in the matrix. I have discussed classic multi-dimensional scaling (or MDS) in a previous blog post, where I explained its connection to principal component analysis. In the case of ordinal data, non-metric MDS seeks to find points in a low-dimensional Euclidean space so that the ranks of distances correspond to the input ordinal matrix. It has been used in computational biology, for example in the analysis of gene expression matrices. The idea originates with a classic paper by Kruskal,that remains a good reference for understanding non-metric MDS. The key idea is summarized in his Figure 4:
Formally, in Kruskal’s notation, given a dissimilarity map (symmetric matrix with zeroes on the diagonal and nonnegative entries), the goal is to find points x in
so that their pairwise distance match in rank. In Kruskal’s Figure 4, points on the plot correspond to pairs of points in
and
is shown on the y-axis, while the Euclidean distance between the points, represented by
, is shown on the x-axis. Monotonically increasing values
are then chosen so that
is minimized. The function S is called the “stress” function and is further normalized so that the “stress” is invariant up to scaling of the points. An iterative procedure can then be used to optimize the points, although results depend on which starting configuration is chosen, and for this reason multiple starting positions are considered.
I converted the smiley/frowny faces into numbers 0,1 or 2 (for red, yellow and green faces respectively) and was able to easily experiment with non-metric MDS using an implementation in R. The results for a 2D scaling of the friendship matrix are shown in the figure below:
It is evident that, as expected from the friendship matrix, ISIS is an outlier. One also sees some of “the enemy of thine enemy is thy friend”. What is interesting is that in some cases the placements are clearly affected by shared allegiances and mutual dislikes that are complicated in nature. For example, the reason Saudi Arabia is placed between Israel and the United States is the friendship of the U.S. towards Iraq in contrast to Israel’s relationship to the country. One interesting question, that is not addressed by the non-metric MDS approach, is what the direct influences are. For example, it stands to reason that Israel is neutral to Saudi Arabia partly because of the U.S. friendship with the country- can this be inferred from the data in the same way that causative links are inferred for gene networks? In any case, I thought the scaling was illuminating and it seems like an interesting exercise to extend the analysis to more countries/organizations/entities but it may be necessary to deal with missing data and I don’t have the time to do it.
I did decide to look at the 1D non-metric MDS, to see whether there is a meaningful one-dimensional representation of the matrix, consistent with some of the maps I’d seen. As it turns out, this is not what the data suggests. The one-dimensional scaling described below places ISIS in the middle, i.e. as the “neutral” country!
Israel -4.55606607 Saudi Arabia -3.62249810 Turkey -3.04579321 United States -2.6429534 Egypt -1.12919328 Al-Qaida -0.38125270 Hamas 0.01629508 ISIS 0.40101149 Palestinian Authority 1.55546030 Iraq 2.23849150 Hezbollah 2.66933449 Iran 3.29650784 Syria 5.20065616
This failure of non-metric MDS is simply a reflection of the fact that the friendship matrix is not “one-dimensional”. The Middle East is not one-dimensional. The complex interplay of Sunni vs. Shia, terrorist vs. freedom fighter, muslim vs. infidel, and all the rest of what is going on make it incorrect to think of the conflict in terms of a single attribute. The complex pattern of alliances and conflicts is therefore not well explained by two-colored maps, and the computations described above provide some kind of a “proof” of this fact. The friendship matrix also explains why it’s difficult to have meaningful discussions about the Middle East in 140 characters, or in Facebook tirades, or with soundbites on cable news. But as complicated as the Middle East is, I have no doubt that the “friendship matrix” of my colleagues in computational biology would require even higher dimension…
Recent Comments