In the August 2013 issue of Nature Biotechnology there were two back-to-back methods papers published in the area of network theory:

- Baruch Barzel & Albert-László Barabási, Network link prediction by global silencing of indirect correlations,
*Nature Biotechnology***31**(8), 2013, p 720–725. doi:10.1038/nbt.2601. - Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, Network deconvolution as a general method to distinguish direct dependencies in networks,
*Nature Biotechnology***31**(8), 2013, p 726–733. doi:10.1038/nbt.2635.

This post is the first of a trilogy (part2, part3) in which my student Nicolas Bray and I tell the story of these papers and why we took the time to read them and critique them.

We start with the Barzel-Barabási paper that is about the applications of a model proposed by Barzel and his Ph.D. advisor, Ofer Biham (although all last names start with a B, Biham is not to be confused with Barabási):

In order to quantify connectivity in biological networks, Barzel and Biham proposed an experimental perturbation model in the paper Baruch Barzel & Ofer Biham, Quantifying the connectivity of a network: The network correlation function method, *Phys. Rev. E* 80**,** 046104 (2009) that forms the basis for network link prediction in Barzel-Barabási. In the context of biology, link prediction refers to the problem of identifying functional links between genes from data that may be confounded by indirect effects. For example, if gene A inhibits the expression of gene B, and also gene B inhibits the expression of gene C, then if the expression of A increases, it will decrease the expression of B, which in turn increase C. Therefore one might observe correlation in the expression levels of gene A and C, even though there is no direct interaction between them. The Barzel-Biham model is based on perturbation experiments. Assuming that a system of genes is in equilibrium, it is a model for the change in expression of one gene in response to a small perturbation in another.

The parameters in the Barzel-Biham model are entries in what they call a “local response matrix” *S* (any matrix with * *for all *i*). Physical arguments pertaining to perturbations at equilibrium lead to the equations

(off the diagonal) and for all *i.* (1)

for a “global response matrix” *G* that can, in principle, be observed and used to infer the matrix *S*. The innovation of Barzel and Barabási is to provide an approximate formula for recovering *S* from *G*, specifically the formula

(2)

where *D(M)* denotes the operation setting off-diagonal elements of *M* to zero. A significant part of the paper is devoted to showing that the approximation (2) is good. Then they suggest that (2) can be used to infer direct causal links in regulatory networks from collections of expression experiments. Barzel and Barabási claim that the approximation formula (2) is necessary because exact inference of *S *from *G* requires solving the intractable system of equations

(off the diagonal) and for all *i*. (3)

The assertion of intractability is based on the claim that the equations are coupled. They reason that since the naïve matrix inversion algorithm requires operations for *m* equations, the solution of (3) would require time . When we looked at this system, our first thought was that while it is large, it is also structured. We sat down and started examining it by writing down the equations for a simple case: a matrix *S* for a graph on 3 nodes. We immediately noticed the equations decoupled into *n* systems of *n* equations where system *i *is given by and , with the *n* unknowns . This immediately reduces the complexity to , or even by simple parallelization. In other words, **the system is trivially tractable.**

But there is more: while looking at the paper I had to take a quick bathroom break, and by the time I returned Nick had realized he could apply the Sherman-Morrison formula to obtain the following formula for the exact solution:

. (4)

Here the operator “/” denotes element-wise division, a simple operation to execute, so that **inferring ***S* from *G* requires no more than inverting *G*** and scaling it**, a formula that is also *much* simpler and more efficient to compute than (2). [Added 2/23: Jordan Ellenberg pointed out the obvious fact that off the diagonal means that for some diagonal matrix , and therefore and since the diagonal entries of must be zero it follows that . In other words, the Sherman-Morrison formula is not even needed]. While it would be nice for us to claim that our managing to quickly supersede the main result of a paper published in Nature Biotechnology was due to some sort of genius, in fact **the entire exercise would be suitable for an undergraduate linear algebra homework problem**.** **Barabási likes to compare himself to the great physicist and nobel laureate Subrahmanyan Chandrasekhar, but it is difficult to imagine the great Chandrasekhar having similar difficulties.

The approach to solving (4) has an implication that is even more important than the solution itself. It provides a dual formula for computing *G* from *S *according to (1), i.e. to simulate from the model. Using the same ideas as above, one finds that

where . (5)

**Unlike Barzel & Barabási that resorted to simulating with Michaelis-Menten dynamics in their study of performance of their approximation, using (4) we can efficiently simulate data directly from the model.** One issue with Michaelis-Menten dynamics is that they make more sense for enzymatic networks as opposed to regulatory networks (for more on this see Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks, *Nat Rev Mol Cell Biol* **9,** 770–780 (2008)), but in any case performance on such dynamics is hardly a validation of (2) since its mixing apples and oranges. So what happens when one simulates data from the Barzel-Biham model and then tries to recover the parameters?

A comparison of the standard method of regularized partial correlations with exact inference for the Barzel-Biham model. Random sparse graphs were generated according to the Erdös-Renyi graph model *G(5000,p) *where *p *was varied to assess performance at different graph densities (shown on x-axis) The y-axis shows the average AUROC obtained from 75 random trials at each density.

When examining simulations from the Barzel-Biham model with graphs on 5,000 nodes (see Figure above), we were surprised to discover that when adding even small amounts of noise,** the exact algorithm (4) failed to recover the local response matrix from G** (we also analyzed the approximation (3) and observed that it always resulted in performance inferior to (4), and that 5% of the time the correlation with the exact solution was negative). This sensitivity to noise is due to the term in the exact formula which becomes problematic if the diagonal entries of are close to zero.

Some intuition for the behavior of may be gained from noting that if *S* is such that its geometric sum converges, the diagonal of is equal to that of . If *S* has mixed signs and there is significant feedback within the network, the diagonal of may be close to zero and any noise in the measurement of *G* could create very large fluctuations in the inferred *S*. This means that the results in Figure 1 are not dependent on the graph model chosen (Erdös-Renyi) and will occur for any reasonable model of gene regulatory networks including the modeling of both enhancers and repressors. From Figure 2a in their paper, it appears that Barzel and Barabási used in their simulation an *S* with only positive entries that would preclude such effects. Such an assumption is biologically unrealistic.

However, the difficulties with noise for the Barzel-Biham model go much deeper. While a constant signal-to-noise ratio, as assumed by Barzel and Barabasi, is a commonly used model for errors in experiments, it is important to remember that **there is no experiment for directly measuring the elements of G.** Obtaining from an experiment is done by making a small perturbation of size e to gene *i*, observing the change in gene *j*, and then dividing that change by e. This last step increases the noise on the estimate of by a factor of *1/e* (a large number, for a perturbative experiment) above the noise already present in the measurements. Increasing *e* acts to remove the system from the perturbative regime and thereby increases the intrinsic error in estimating *G*. It is therefore the case that attaining reasonable error on *G* will require very low noise in the original measurements. In this case of biological networks this would mean performing many replicates of the experiments. However, as Barzel & Barabási acknowledge in their paper, even a single replicate of a perturbation experiment is not currently feasible.

While the exact algorithm (4) for inverting the Barzel-Biham model performs poorly, we found that a widely used shrinkage method based on partial correlation (Schäfer, J. & Strimmer, K. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, *Statistical Applications in Genetics and Molecular Biology* 4**,** (2005)) outperforms the exact algorithm (blue curve in Figure above). **This suggests that there is no input for which (4) might be useful. **The method is not even ideal for inference from data generated by the model it is based on.

This brings us to the “results” section of the paper. To demonstrate their method called Silencer, Barzel & Barabási ran it on only **one of three** datasets from the DREAM5 data. They then compared the performance of Silencer to **three out of thirty five methods** benchmarked in DREAM5. **The Barzel-Biham model is for perturbation experiments, but Barzel & Barabási just threw in data from another universe (e.g. mutual information matrices).** But lets just go with that for a moment. Their results are shown in the figure below:

Figure 3 from Barzel-Barabási.

The three methods tested potential improvements on are Pearson, Spearman and Mutual Information. Pearson and Spearman rank 16/35 and 18/35 respectively in the DREAM5 benchmarks. There may be some reason why Silencer applied on top of these methods improves performance: in the case where *G* is a correlation matrix, the path interpretation given by Barzel and Barabási connects the inference procedure to Seawall Wright’s path coefficients (ca. 1920), which in turn suggests an interpretation in terms of partial correlation. However in the case of mutual information, a method that is ranked 19/35 in the DREAM5 benchmarks, **there is no statistically significant improvement at all**. The improvement is from an AUROC of 0.67 to 0.68. Amazingly, Barzel and Barabási characterize these results by remarking that they “improve upon the ** top**-performing inference methods” (emphasis on top is ours). Considering that the best of these rank 16/35 the use of the word “top” seems, shall we say, unconventional.

We have to ask: **how did Barzel & Barabási get to publish a paper in the journal Nature Biotechnology on regulatory network inference without improvement or testing on anything but a handful of mediocre DREAM5 methods from a single dataset? **To put the Barzel-Barabási results in context, it is worth considering the standards the Feizi *et al.* paper were held to. In that paper the authors compared their method to DREAM5 data as well, except they tested on all 3 datasets and 9 methods (and even on a community based method). We think its fair to conclude that significantly more testing would have to be done to argue that Silencer improves on existing methods for biological network inference.

We therefore don’t see any current practical utility for the Barzel-Biham model, except possibly for perturbation experiments in small sub-networks. Even then, we don’t believe it is practical to perform the number of experiments that would be necessary to overcome signal to noise problems.

Unfortunately the problems in Barzel-Barabási spill over into a follow up article published by the duo: Barzel, Baruch, and Albert-László Barabási, “Universality in Network Dynamics.” *Nature Physics* 9 (2013). In the paper they assume that the local response matrix *S *has entries that are all positive, i.e. they do not allow for inhibitory interactions. Such a restriction immediately renders the results of the paper, if they are to be believed, moot in terms of biological significance. Moreover, the restrictions on *S* appear to be imposed in order to provide approximations to *G *that are unnecessary in light of (5). Given these immediate issues, we suspect that were we to read the Universality paper carefully, it is quite likely this post would have to be lengthened considerably.

These are not the first of Barabási’s papers to package meaningless and incoherent results in Nature/Science publications. In fact, there is a long history of Barabási publishing with fanfare in top journals only to have others respond by publishing technical comments on his papers, in many cases refuting completely the claims he makes. Unfortunately many of the critiques are not well known because they are rejected from the journals where Barabási is successful, and instead find their way to preprint servers or more specialized publications. Here is a partial list of Barabási finest and the response(s):

- Barabasi is famous for the “BA model”, proposed in Barabási and Albert ‘Emergence of Scaling in Random Networks“,
**Science**, Vol. 286 15 October 1999, pp. 509-512. Lada Adamic and Bernardo Huberman immediately refuted the practical applications of the model. Moreover, as pointed out by Willinger, Alderson and Doyle, while it is true that scale-free networks exhibit some interesting mathematical properties (specifically they are resilient to random attack yet vulnerable to worst-case), even the math was not done by Barabási but by the combinatorialists Bollobás and Riordan. - Barabási has has repeatedly claimed metabolic networks as prime examples of scale-free networks, starting with the paper Jeong
*et al.*, “The large-scale organization of metabolic networks“,**Nature**407 (2000). This fact has been disputed and refuted in the paper Scale*Rich*Metabolic Networks by Reiko Tanaka. - The issue of attack tolerance was the focus of Error and attack tolerance of complex networks by Réka Albert, Hawoong Jeong & Albert-László Barabási in
**Nature**406 (2000). John Doyle refuted the paper completely in this paper. - In the paper “The origin of bursts and heavy tails in human dynamics” published in
**Nature**435 (2005) Barabási pretends to offer insights into the “bursty nature of human behavior” (by analyzing e-mail). In a follow up comment Daniel Stouffer, Dean Malmgren and Luis Amaral demonstrate that the reported power-law distributions are solely an artifact of the analysis of the empirical data and the proposed model is not representative of e-mail communication patterns. - Venturing into the field of control theory, the paper “Controllability of complex networks” by Liu, Slotine and Barabási,
**Nature**473 (2011) argues that “sparse inhomogeneous networks, which emerge in many real complex systems, are the most difficult to control, but that dense and homogeneous networks can be controlled using a few driver nodes.” Not so. In a beautiful and strong rebuttal, Carl Bergstrom and colleagues show that a single control input applied to the power dominating set is all that is required for structural controllability of most, if not all networks. I have also blogged about this particular paper previously, explaining the Bergstrom result and why it reveals the Barabási paper to be a control theory embarrassment.

In other words, Barabási’s “work” is a regular feature in the journals Nature and Science despite the fact that many eminent scientists keep demonstrating that the network emperor has no clothes.

*Post scriptum*. After their paper was published, Barzel and Barabási issued a press release claiming that “their research moves the team a step closer in its quest to understand, predict, and control human disease.” The advertisment seems like an excellent candidate for Michael Eisen’s pressies.

## 18 comments

Comments feed for this article

February 10, 2014 at 8:50 am

Pedro Mendes (@gepasi)This work was not even new. In 2002 two very similar methods were published: http://www.ncbi.nlm.nih.gov/pubmed/12142007 & http://www.ncbi.nlm.nih.gov/pubmed/12242336 . Recently they have been applied experimentally: http://www.ncbi.nlm.nih.gov/pubmed/17310240

February 11, 2014 at 5:16 am

pmelstedThe Bollobás, Riordan paper is a beautiful example of coupling graphs. They show that within every preferential attachment graph there is a large sparse random graph and each PA graph has a large part contained inside a sparse random graph. You won’t see any such insight into the underlying random structure in B’s papers.

February 11, 2014 at 2:16 pm

Elchanan MosselCritical discussions are good. It’s a shame that “leading” journals do not publish critique, replications (including when things do not replicate) and papers pointing to methodological errors. Given the lack of appropriate forum it is good you publish critique on your blog. Did you ask the authors to take a look and defend themselves?

From a wider perspective, how come everybody knows that the leading science journals are full of bad science and still everybody want to publish there?

February 12, 2014 at 3:22 am

Gareth“It is not a case of choosing those [faces] that, to the best of one’s judgment, are really the prettiest, nor even those that average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be. And there are some, I believe, who practice the fourth, fifth and higher degrees.” (Keynes, General Theory of Employment Interest and Money, 1936). (From Wikipedia.)

As long as we all believe that everybody wants to publish there, we will all want to publish there, right?

February 12, 2014 at 4:19 am

Lior PachterThanks Elchanan.

To answer your question, we submitted a truncated version of this blog post to Nature Biotechnology almost 5 months ago. The two page comment has not yet been decided on by the journal, and my understanding is that a major reason has been the delay of Barzel and Barabási in responding to it. So they did get a chance to respond, but to be honest we just gave up on waiting. Its still possible that Nature Biotechnology will publish our comment and their response. I don’t know, and I don’t particularly care at this point. The authors are of course welcome to post replies in this comments section.

February 18, 2014 at 9:30 pm

Andy JerkinsHey Lior,

Can you post this two page comment, like you did for Manolis Kellis’ paper, or is that the same comment?

February 18, 2014 at 11:37 pm

Lior PachterWe have a separate comment for this paper and it is still under review at Nature Biotechnology. It follows the post closely, and I plan to post it in the near future.

February 13, 2014 at 8:56 am

GuilhemThank you for an excellent read. I had myself struggled a lot with the issue of noise when trying their approach in R, and rapidly concluded that the method was simply not usable for any of the biological networks and correlation networks I was interested in. It is astonishing that a method so overly sensitive to noise and multicollinearity is being sold as applicable to biological data.

I think that the following paper by Lima-Mendez and van Helden would be a nice addition to your list of responses to previously published claims:

http://pubs.rsc.org/en/content/articlehtml/2009/mb/b908681a

February 18, 2014 at 12:49 pm

Christos OuzounisSee also, http://www.ncbi.nlm.nih.gov/pubmed/15496552 our modest contribution to this controversial / trendy subject, years ago.

February 19, 2014 at 4:53 pm

ALB is misleading.I found the ALB paper very misleading in terms of their claimed contributions. The paper looks pretty but the content is very hollow, not to mention there are technical issues. I showed the paper to one of the students, and he quickly spotted the derivation was nonsense. The journal should clarify why this paper is accepted.

February 26, 2014 at 5:42 am

Yu XueHi, I am a Bioinformatician in China, who is working on computational analysis of post-translational modifications in proteins. My research field is a little bit cross with network analysis. I leave my personal comments below:

1. I really really hate Lior’s blogs, because the three blogs cost me two weeks to understand the major viewpoints. Lior’s great match really scared me. So I have to improve my math for a better understanding, and have to discuss with my friends.

2. Great thanks to write the blogs. My friends and I truly respect you, and what you are doing. Everybody clearly know that you will get no profit from doing so, but you did. You are a great man and a real computational biologists.

3. After two weeks reading and thinking, I wrote three blogs in Chinese, for researchers in China, to introduce Lior’s blogs, as below:

(1) http://blog.sciencenet.cn/home.php?mod=space&uid=404304&do=blog&id=770977

(2) http://blog.sciencenet.cn/home.php?mod=space&uid=404304&do=blog&id=771158

(3) http://blog.sciencenet.cn/home.php?mod=space&uid=404304&do=blog&id=771207

Sorry that I wrote the blogs in Chinese, but I tried my best to keep the major viewpoints of Lior. So Chinese scientists can take a look on it, if you do not want to read English. I will apologize if some misunderstandings were made, because it’s too difficult to me for the math.

It looks like the overhype is quite common in academic society. I am offen stranged that some really poor work can be published in top journals, but noboday dare to stand out. So, the Bioinformatics field should thank for Lior’s criticisms, which is much, much, much helpful for the current and future of the field.

February 27, 2014 at 9:23 am

Junpeng LaoI am afraid you had made a mistake in your Chinese blog, especially the first one. The scaling parameter is for the eigenvalue but not for the sequence as you explained. You should give a careful read of Lior’s new blog post https://liorpachter.wordpress.com/2014/02/18/number-deconvolution/ for details.

February 28, 2014 at 7:25 am

Jack B.Lior

You are not the first person to notice that ALB papers were mostly BS.

Same with the Medard and Kellis paper. In fact, the entire literature is full of these kinds of BS. The problem is that people do research to get grants and not the other way around.

But you are the first that had the “balls” to openly announce these findings. I do not have the same amount of courage as you do (and I commend you on that).

March 7, 2014 at 11:15 pm

village-idiotYes, there was a short piece in Science not too long ago about this:

http://www.sciencemag.org/content/335/6069/665

But the cottage industry that Barbasi fathered has spun so much out of control that it is unstoppable.

September 27, 2014 at 3:30 pm

Ising ModelThis guy should become an NAS and NAE member! He markets his junk quite well.

January 3, 2015 at 3:25 pm

Tom ChouBTW, comparing oneself to Chandrasekar should be done with care since Edmund Stoner had scooped (and did a more thorough calculation) Chandrasekar but did not get the notoriety for some “reason”. Moreover, Chandrasekar it note cite Stoner, even though the evidence is that he knew about the earlier work.

So, why stop at NAS/NAE….go straight to Stockholm!

January 3, 2015 at 3:50 pm

Tom ChouTypo…I meant Chandrasekar DID NOT cite Stoner….

January 10, 2015 at 3:44 pm

NikolaI’m so glad I came across your post today and find out about Barabási’s history of nonsense papers.

He published two closely related papers in Science (http://www.sciencemag.org/content/327/5968/1018) and Nature (http://www.nature.com/nature/journal/v484/n7392/full/nature10856.html#close) on migration and mobility. I am a population geographer and migration researcher and wondered why I could not make sense of a paper published in a prestigious journal. The findings don’t look plausible at all to me, and the model cannot be replicated because the data haven’t been made available…. Anyway, now I know what to think of it.