You are currently browsing the monthly archive for February 2014.

Given the reaction to our use of the word “fraud” in our blog post on Feizi et al., we would like to remind readers how it was actually used:

In academia the word fraudulent is usually reserved for outright forgery. However given what appears to be deliberate hiding, twisting and torturing of the facts by Feizi et al., we think that fraud (“deception deliberately practiced in order to secure unfair or unlawful gain”) is a reasonable characterization. If the paper had been presented in a straightforward manner, would it have been accepted by Nature Biotechnology?

While the reaction has largely focused on reproducibility and the swapping of figures, we reiterate our stance: misleading one’s readers (and reviewers) is itself a form of scientific fraud. Regarding the other questions raised, the response from Feizi, Marbach, Médard and Kellis falls short. On their new website their code has changed, their explanations are in many cases incoherent, self-contradictory, and make false claims, and the newly added correction to Figure S4 turns out not to explain the difference between the figures in the revisions. We explain all of this below. And yet for us the claim of fraud can stand on the basis of deception alone. The distance between the image created by the main text of the paper and the truth of their method is simply too great.

However, one unfortunate fact is that that judging that distance requires understanding some of the mathematics in the paper, and another common reaction has been “I don’t understand the math.”  For such readers, we explain network deconvolution in simple terms by providing an analogy using numbers instead of matrices. Understanding this requires no more than high school algebra. We follow that with a response to Feizi et al.‘s rebuttal.

In what follows, number deconvolution, a simplified form of network deconvolution, is explained in red. Network deconvolution is explained in blue.

The main concept to understand is that of a geometric series. This is a series with a constant ratio between successive terms, for example

$\displaystyle \frac{1}{2}+\frac{1}{4}+\frac{1}{8} + \cdots \ \ \ \ \ (1)$

In this case, each term is one half the previous term, so that the ratio between successive terms is always ${\frac{1}{2}}$. This sum converges to ${1}$, an intuitive notion that is formalized by defining the infinite sum to be

$\displaystyle \sum_{i=1}^{\infty} \left( \frac{1}{2}\right)^i = lim_{n \rightarrow \infty} \sum_{i=1}^n \left( \frac{1}{2} \right)^i = 1. \ \ \ \ \ (2)$

This is the familiar math of the

It is important to note that not every geometric series converges. If the number ${\frac{1}{2}}$ in the sum above is replaced by ${2}$, then the series ${2+4+8+16+\cdots}$ is said to diverge. The are two basic questions about geometric series: (1) for which numbers do they converge and (2) when they do converge, what do they converge to. These two questions are answered in the following:

Let ${a}$ be a real number. The geometric series converges if, and only if, ${-1 < a < 1}$, in which case

$\displaystyle m=\sum_{i=1}^{\infty} a^i = a(1-a)^{-1}. \ \ \ \ \ (3)$

It is not hard to see why this is true:

$\displaystyle m=a+a^2+\cdots = a(1+a+a^2+\cdots) = a(1+m) \Rightarrow m=a(1-a)^{-1}.$

Returning to the example of ${a=\frac{1}{2}}$, we see that it is a special case of this result. It converges because ${-1 < \frac{1}{2} < 1}$, and furthermore, ${\frac{\frac{1}{2}}{1-\frac{1}{2}} = 1}$.

Matrices behave like numbers in some ways, for example they can be added and multiplied, yet quite differently in others. The generalization of geometric series as in Feizi et al. specifically deals with diagonalizable matrices, a class of matrices which has many convenient properties. Let ${A}$ be such a matrix. The geometric series converges if, and only if, the eigenvalues of ${A}$ (numbers associated to the matrix) lie strictly between ${-1}$ and ${1}$, in which case

$\displaystyle M = \sum_{i=1}^{\infty} A^i = A \cdot (I-A)^{-1} \ \ \ \ \ (4)$

Notice that what has changed is that the condition that ${-1 has been replaced by an eigenvalue restriction on m and the formula  $A(I-A)^{-1}$ is just like $a(1-a)^{-1}$ except the operation of matrix inversion is required instead of number inversion. The result is obvious from elementary linear algebra because the assumption that ${A}$ is diagonalizable means that ${A=UDU^{-1}}$ for some matrix ${U}$, where ${D}$ is a diagonal matrix with the eigenvalues of ${A}$ on the diagonal. Therefore ${A^k=UD^kU^{-1}}$ and the series converges if, and only if, the geometric series formed by summing the powers of each eigenvalue converge.

In following Feizi et al., we call the number ${\frac{1}{2}}$ in the example above the “direct number”, and the remainder of the sum the “indirect number”. One should think of the indirect number as consisting of echoes of the direct number. Furthermore, following the language in Feizi et al., we call the ratio between terms, again the number ${\frac{1}{2}}$ in the example above, the indirect flow. When it is strong, the terms decrease slowly. When it is weak, the terms decrease rapidly.

In Feizi et al., the matrix A is called the “direct effect”, and the remainder of the sum the “indirect effect”. One should think of the indirect effect as consisting of echoes of the direct effect. Furthermore, following the language in Feizi et al., we call the rate at which the terms in the indirect effect decay the indirect flow. When it is strong, the terms decrease slowly. When it is weak, the terms decrease rapidly.

In number deconvolution the goal is to infer ${a}$ from ${m}$. That is, one would like to remove the indirect effect, the echoes, out of ${m}$. If ${m=a(1-a)^{-1}}$, then solving for ${a}$ we obtain

$\displaystyle a=m(1+m)^{-1}. \ \ \ \ \ (5)$

That is the formula for number deconvolution.

In network deconvolution the goal is to infer  ${A}$ from ${M}$. That is, one would like to remove the indirect effect, the echoes, out of ${M}$ as follows: If ${M=A(I-A)^{-1}}$ then solving for ${A}$ we obtain

$\displaystyle A = M \cdot (I+M)^{-1} \ \ \ \ \ (6)$

That is the formula for network deconvolution.

Everything looks great but there is a problem. Even though one can plug any number ${m}$ (except ${m=-1}$) into the formula and get out an ${a}$, the formula only gives an ${a}$ satisfying ${\sum_{i=1}^{\infty} a^i = m}$ when ${m > -\frac{1}{2}}$. For example, starting with ${m=-2}$ and then plugging it into number deconvolution one gets ${a=2}$. But ${2+4+ 8+\cdots}$ diverges.

The restriction on ${m}$ in number deconvolution, namely that it has to be bigger than ${-\frac{1}{2}}$, can be translated into the same restriction on the eigenvalues of ${M}$ for network deconvolution. This follows from the fact that ${M \, = \, \sum_{i=1}^{\infty}A^i \, = \, \sum_{i=1}^{\infty}UD^iU^{-1}}$ means that if ${\lambda}$ is an eigenvalue of ${A}$ then ${\mu :=\frac{\lambda}{1-\lambda}}$ is an eigenvalue of ${M}$ and all the eigenvalues of ${M}$ arise in this way. Therefore, ${\lambda = \frac{\mu}{1+\mu}}$ and the condition ${-1 < \lambda < 1}$ holds if, and only if, ${\mu>-\frac{1}{2}}$

But what if we wanted number deconvolution to work for all negative numbers? One idea is to follow Feizi et al.’s approach and introduce a scaling factor called ${\gamma}$ to be applied to ${m}$, so that the product ${\gamma m}$ can be deconvolved. For example, suppose we start with ${m=-12}$. We’d like to find ${a}$. But we can’t just apply number deconvolution. So we multiply ${m}$ by ${-\frac{1}{12}}$ to get ${1}$, and then deconvolve that to get ${a=\frac{1}{2}}$. We could have multiplied ${m}$ by something else, say ${\frac{-1}{6}}$ in which case we’d get ${a=\frac{2}{3}}$. In fact, there are infinitely many scaling factors, giving infinitely many solutions ${a}$. In fact, we can get any ${a}$ between ${-1}$ and ${1}$.

When ${M}$ is not writable as a geometric sum (decomposable), there exist scaling factors ${\gamma}$ such that the scaled matrix ${\gamma M}$ is decomposable. This is obvious because if ${\mu_{-} =\mu_1 \leq \ldots \leq \mu_n = \mu_{+}}$ are the eigenvalues of ${M}$, then ${\gamma \mu_1,\ldots,\gamma \mu_n}$ are the eigenvalues of ${\gamma M}$ so by choosing ${0 \leq \gamma < |\frac{1}{2\mu_{-}}|}$ we are guaranteed that ${\gamma M}$ is decomposable. Let ${M}$ be a real symmetric matrix with minimum/maximum eigenvalues ${\mu_{-}}$ and ${\mu_{+}}$ respectively and ${0<\beta<1}$. If

$\displaystyle \gamma = \frac{\beta}{max\left((1-\beta)\mu_{+},(-1-\beta)\mu_{-}\right)} \ \ \ \ \ (7)$

then the matrix ${\gamma M}$ is decomposable as ${\gamma M = \sum_{i=1}^{\infty}A^i}$. Furthermore, if ${\lambda_{-},\lambda_{+}}$ are the minimum/maximum eigenvalues of ${A}$ respectively then ${max(|\lambda_{-}|,|\lambda_{+}|) = \beta}$ (we omit the derivation, but it is straightforward).

Matrices do behave differently than numbers, and so it is not true that network deconvolution can produce any matrix as an output. However, as with number deconvolution, it remains true that network deconvolution can produce an infinite number of possible matrices.

Not only do Feizi et al. not mention this anywhere in the main text, instead they create the impression that there is a single output from any input.

Feizi et al.’s response

We now turn to the response of Feizi et al. to our post. First, upon initial inspection of the “Try It Out” site, we were astounded to find that the code being distributed there is not the same as the code distributed with the paper (in particular, step 1 has been removed) and so it seems unlikely that it could be used to replicate the paper’s results. Unless, that is, the code originally distributed with the paper was not the code used to generate its results. Second, the “one click reproduction”s that the authors have now added do not actually start with the original data but instead merely produce plots from some results without showing how those results were generated. This is not what reproducibility means. Third, while the one matrix of results from a “one click reproduction” that we looked at (DI_ND_1bkr.mat) was very close to the matrix originally distributed with the paper, it was slightly different. It was close and hopefully it does generate basically the same figure, but as we explain below, we’ve spent a good bit of time on this and have no desire to spend any more. This is why we have tried to incentivize someone to simply reproduce the results.

In retrospect, we regret not explaining exactly how we came to be so skeptical about the reproducibility of the results in the Feizi et al. paper. While we very quickly (yet not instantly) recognized how misleading the paper is, our initial goal in looking at their results was not to verify reproducibility (which we assumed) but rather to explore how changing the parameters affected the results. Verifying that we could recover the results from the paper was only supposed to be the first step in this process. We downloaded the file of datasets and code provided by the authors and began examining the protein contact dataset.

After writing scripts that we verified exactly regenerated some figures from the paper from the output matrices distributed by the authors, we then checked whether we arrived at the same results when running the ND code on what we assumed was the input data (files with names like “MI_1bkr.txt” while the output was named “MI_ND_1bkr.txt”). We were surprised when the output did not match, however we were then informed by the authors that they had not distributed the input data but rather thresholded versions of the input. When we asked the authors to provide us with the actual data, we were told that it would violate scientific etiquette to send us another scientist’s dataset and that we would have to regenerate it ourselves. We had never heard this claimed point of etiquette before, but acquiesced and attempted to do so. However, the matrices produced from the original data were actually a different size than those used by the authors (suggesting that they had used a different alignment).

Stymied on that front, we turned to the DREAM5 dataset instead. While the actual scoring that goes into the DREAM analysis is fairly complicated, we decided that we would start by merely checking that we could regenerate the output provided by the authors. We eventually received step-by-step instructions for doing so and, because those steps did not produce the provided output, asked a simple question suggested by our experience with protein contact dataset: to generate the provided results, should we use as input the provided non-ND matrices? We asked this question four times in total without receiving a reply. We sent our script attempting to implement their steps and received no response.

At some point, we also discovered that the authors had been using different parameter values for different datasets without disclosing that to their readers. They could provide no coherent explanation for doing so. And sometime after this, we found that the authors had removed the data files from their website: readers were directed instead to acquire the datasets elsewhere and the results from the paper were no longer provided there.

At this point, we expect it is obvious why we were skeptical about the reproducibility of the results. Having said that, we never wanted to believe that the results of the paper were not reproducible. It was not our initial assumption and we still hope to be proven wrong. However, we continue to wait, and the fact that the code has been changed yet again, and that the authors’ explanation in regards to figure S4 does not appear to explain its changes (see below) do not fill us with confidence.

We repeat that while most of the discussion above has focused on replicability, we will quote again the definition of “fraud” used in the original post: “deception deliberately practiced in order to secure unfair or unlawful gain”. Our main point is this: would any honest reader of the main text of the original paper recognize the actual method implemented? We don’t think so, hence deception. And we believe the authors deliberately wrote the paper in this way to unfairly gain acceptance at Nature Biotechnology.

Now, in response to the authors rebuttal, we offer the following (Feizi et al. remarks in italics):

Point-by-point rebuttal

Appendix A

The pre-processing step that Bray and Pachter criticize (step 1 in their description of our work) has no effect on the performance of ND. Mathematically, because our matrices are non-negative with min zero, a linear scaling of the network results in a similar scaling of its eigenvalues, which are normalized during eigenvalue scaling, canceling out the linear scaling of step 1. Practically, removing step 1 from the code has little effect on the performance of ND on the DREAM5 regulatory networks, as we show in Figure 1 below.

We cannot imagine how the authors came to make the statement “our matrices are non-negative with min zero”. After all, one of the inputs they used in their paper was correlation matrices and they surely know that correlations can be negative. If the map in step 1 were linear rather than – as we correctly stated in the document they are responding to – affine (for those unfamiliar with the terms, “linear” here means only scaling values while “affine” means scaling and shifting as well), then this explanation would be correct and it would have no effect. This just makes it even stranger that immediately after giving this explanation, the authors produced a graph showing the (sometimes quite significant) effect of what we assume was the affine mapping.

Stranger still is that after this defense, Step 1 has now been removed from the code.

Appendix B

The eigenvalue scaling (step 5) is essential both theoretically, to guarantee the convergence of the Taylor series, and practically, as performance decreases without it in practice. This step is clearly stated in both the main text of our paper and in the supplement, despite the claims by Bray and Pachter that it somehow was maliciously concealed from our description of the method (just search the manuscript for the word ‘scaling’). Our empirical results confirm that this step is also necessary in practice, as ND without eigenvalue scaling is consistently performing worse, as we show in Figure 2 below:

We do not understand why the author’s felt the need to defend the use of eigenvalue scaling when we have never suggested it should be removed. Our objection was rather to how it was presented (or not) to the reader. Yes, there is a mention of the word “scaling”. But it is done so in a way that makes it sound as though it were some trivial issue: there’s an assumption, you scale, assumption satisfied, done. And this mention occurs in the context of the rest of the main text of the paper which presents a clear image of a simple, parameterless, 100% theoretically justified method: you have a matrix, you put it into the method, you get the output, done (and it is “globally optimal”). In other words, it is the math we discuss above prior to the arrival of the parameter $\gamma$.

Yes, scaling is dealt with at length in the supplement. Perhaps Feizi et al. think it is fine for the main text of a paper to convey such a different image from the truth contained in its supplement and in the code. We do not and we would hope the rest of the scientific community agrees with us.

Appendix C: Robustness to input parameters.

Bray and Pachter claim that “the reason for covering up the existence of parameters is that the parameters were tuned to obtain the results”. Once again the claims are incorrect and unfounded. (1) First, we did not cover up the existence of the parameters. (2) Second, we used exactly the exact same set of parameters for all 27 tested regulatory networks in the DREAM challenge. (3) Third, we show here that our results are robust to the parameter values, using β = 0.5 and β = 0.99 for the DREAM5 network.

(1) Again: in the main text of the paper, there is not even a hint that the method could have parameters. This is highly relevant information that a reader who is looking at its performance on datasets should have access to. That reader could then ask, e.g. what parameters were used and how they were selected. They would even have the ability to wonder if the parameters were tuned to improve apparent performance. It is worth noting that when the paper was peer reviewed, the reviewers did not have access to the actual values of the parameters used.

(2) It is, of course, perfectly possible to tune parameters while using those same parameters on multiple datasets.

(3) The authors seem to have forgotten that there are two parameters to the method and that they gave the other parameter (alpha) different values on different datasets as well. They also have omitted their performance on the “Community” dataset for some reason.

Appendix D: Network Deconvolution vs. Partial Correlation.

Bray and Pachter compare network deconvolution to partial correlation using a test dataset built using a partial correlation model. In this very artificial setting, it is thus not surprising that partial correlation performs better, as it exactly matches the process that generated the data in the first place. To demonstrate the superiority of partial correlation, Bray and Pachter should test it on real datasets, such as the ones provided as part of the DREAM5 benchmarks . In our experience, partial correlation performed very poorly in practice, but we encourage Bray and Pachter to try it for themselves.

We looked at partial correlations here for a very simple reason: the authors originally claimed that their method reduced to it in the context we considered here. Thus, comparing them seemed a natural thing to do.

The idea that we should look at the performance of partial correlations in other contexts makes no sense: we have never claimed that it is the right tool to solve every problem. Indeed, it was the authors who claimed that their tool was “widely applicable”. By arguing for domain specific tools, they seem to be making our point for us.

Claim 1: “the method used to obtain the results in the paper is completely different than the idealized version sold in the main text of the paper”.

The paper clearly describes both the key matrix operation (“step 6” in Lior Pachter’s blog post) which is shown in figure 1, and all the pre- and post-processing steps, which are all part of our method. There is nothing mischievous about including these pre- and post-processing steps that were clearly defined, well described, and implemented in the provided code.

The statement that the paper describes all steps is simply false. For example, the affine mapping appears nowhere.

Claim 2: “the method actually used has parameters that need to be set, yet no approach to setting them is provided”.

It is unfortunate that so many methods in our field have parameters associated with them, but we are not the first method to have them. However, we do provide guidelines for setting these in the supplement.

It is strange to suggest that it is unfortunate that methods have parameters. It is also strange to describe their guidelines as such when they did not even follow them. Also, there is no guideline for setting alpha in the supplement. In their FAQ, they say “we used beta=0.5 as we expected indirect flows to be propagating more slowly”. We wonder where one derives these expectations from.

Dr. Pachter also points out that a correction to Supplement Figure S4 on August 26th 2013 was not fully documented. We apologize for the omission and provide additional details in an updated correction notice dated February 12, 2014.

The correction notice states that the original Figure S4 was plotted with the incorrect x-axis. In particular it states that the maximum eigenvalue of the observed network was used, instead of the maximum eigenvalue of the direct network. We checked this correction, and have produced below the old curve and the new curve plotted on the same x-axis:

The new and old Figure S4. The old curve, remapped into the correct x-axis coordinates is shown in red. The new curve is shown in blue. Raw data was extracted from the supplement PDFs using WebPlotDigitizer.

As can be seen, the two curves are not the same. While it is expected that due to the transformation the red curve should not cover the whole x-axis, if the only difference was the choice of coordinates, the curves should overlap exactly. We cannot explain the discrepancy.

This is the third and final post in a series (part1, part2) of posts on two back-to-back papers published in Nature Biotechnology in August 2013:

1. Baruch Barzel & Albert-László Barabási, Network link prediction by global silencing of indirect correlationsNature Biotechnology 31(8), 2013, p 720–725. doi:10.1038/nbt.2601
2. Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, Network deconvolution as a general method to distinguish direct dependencies in networksNature Biotechnology 31(8), 2013, p 726–733. doi:10.1038/nbt.2635

An inconvenient request

One of the great things about conferences is that there is time to chat in person with distant friends and collaborators. Last July, at the ISMB conference in Berlin, I found myself doing just that during one of the coffee breaks. Suddenly, Manolis Kellis approached me and asked to talk in private. The reason for his interruption: he came to request that I remove an arXiv post of mine, namely “Comment on ‘Evidence of Abundant and Purifying Selection in Humans for Recently Acquired Regulatory Functions“, a response to a paper by Ward and Kellis. Why? He pointed out that my arXiv post was ranking highly on Google. This was inconvenient for him, he explained, while insisting that it was wrong of me to post a criticism of his work on a forum where he could not directly respond. He suggested that it would be best to work out any issues I might have with his paper offline. Unfortunately,  there was the inconvenient truth that arXiv postings cannot be removed. Unlike some journals, where, say, a supplement can be revised while having the original removed (see the figure switching of Feizi et al.), arXiv preprints are permanent.

My initial confusion quickly turned to anger. After all, my arXiv comment had been rejected from Science where I had submitted it as a technical comment on the Ward-Kellis paper. I had then put it on the arXiv as a last resort measure to at least have some record of my concerns publicly accessible. How is this wrong? Can one not critique the work of Manolis Kellis?

Network nonsense begins

My first review of a Manolis Kellis paper was in the fall of 2006, in my capacity as a program committee member of the  Research in Computational Molecular Biology (RECOMB) conference held in Oakland, CA in 2007. Because Oakland is right next to Berkeley, a number of Berkeley professors were involved in organizing and running the conference. Terry Speed was chair of the program committee. I was out of the country that year on sabbatical at the University of Oxford, so I could not participate, or even attend, the conference, but I volunteered to serve on the program committee. For those not familiar with the RECOMB review process, it is modeled after the standard Computer Science conferences. The program committee chair forms the program committee, who are then assigned a handful of papers to review and score. Reviews are entered on a website, and after a brief period of online discussion about borderline papers, scores are revised and accept/reject decisions are made. Authors can revise their manuscripts, but the reviewers never see the papers again before publication in the proceedings.

One of the papers I was assigned to review was by a student named Joshua Grochow and his advisor Manolis Kellis. The paper was titled “Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking“. Although networks were not my research focus at the time, and “symmetry-breaking” evoked in me nightmares from the physics underworld, I agreed to the review. The paper seemed to contain some interesting algorithms, appeared to have a combinatorial flavor, and potentially important applications- a good mix for RECOMB.

The problem addressed by Grochow & Kellis was that of identifying “network motifs” in biological networks. “Motifs” can be defined in a variety of ways, and the Grochow-Kellis objective was simple. In graph theoretic terms, given a graph G, the goal was to find subgraphs occurring with high multiplicity to an extent unlikely in a random graph. There are many models for random graphs, and the one that the results in Grochow-Kellis are based on is the Erdös-Renyi model (each edge chosen independently with some fixed probability). The reason this definition might be of biological interest, is that recurrent motifs interspersed in a graph are likely to represent evolutionarily conserved interaction modules.

The paper begins with a description of the method. I won’t go into the details here, except to say that everything seemed well until I read the caption of Figure 3. There the number 27,720 caught my eye.

During my first few years of graduate school I took many courses on enumerative and algebraic combinatorics. There are some numbers that combinatorialists just “know”. For example, seeing 42 emerge as the answer to a counting problem does not bring to mind Douglas Adams, but rather the vast literature on Catalan numbers and their connections to dozens of well-known counting problems. Similarly, a number such as 126 brings to mind binomial coefficients ($126={9 \choose 4}$), and with them the idea of counting the number of subsets of fixed size from a larger set. When I saw the number 27,720 I had a hunch that somehow some canonical combinatorial set had been enumerated. The idea may have entered my mind because of the picture of the “motif” in which I immediately recognized a clique (all vertices mutually connected) and a stable set (no pair of vertices connected). In any case, I realized that

$27,720 = 220 \cdot 126 = {12 \choose 3} \cdot {9 \choose 4}$.

The significance of this is that the “motif” on the left-hand side of Figure 3 had appeared many times  because of a type of double- or rather thousandfold- counting. Instead of representing statistically significant recurring independent motifs, this “motif” arises because of a combinatorial artifact. In the specific example of Figure 3, the motif occurred once for any choice of 4 nodes from the clique of size 9, and any choice of 3 nodes from the stable set of size 12.

The point is that in a graph, any subgraph attached to a large clique (or stable set) will occur many times. This simple observation is a result of the fact that there are many subgraphs of a clique (or stable set) that are identical. I realized that this meant that the Grochow-Kellis method was essentially a heuristic for finding cliques and stable sets in graphs. The particular “network motifs” they were pulling out were just subgraphs that happened to be connected to some large cliques and stable sets. There are two problems with this: first, a clique or a stable set can hardly be considered an interesting “network motif”. Moreover, the fact that they appear in biological networks much more than in Erdös-Renyi random graphs is not surprising. Second, there is a large literature on finding cliques in graphs, none of which Grochow-Kellis cited or seemed to be familiar with.

The question of the performance of the Grochow-Kellis algorithm is answered in their Figure 3 as well. There is a slightly larger motif consisting of nodes from the stable set of size 12, instead of 3. That motif occurs in all ${12 \choose 6}$ subsets of the stable set instead of ${12 \choose 3}$ subsets which means that there is a motif that occurs 116,424 times! Grochow and Kellis’s algorithm did not even achieve its stated goal. It really ought to have outputted the left hand side figure with six nodes in the stable set on the left, and not three. In other words, this was a paper providing uninteresting solutions from a biological point of view, and doing so poorly to boot.

I wrote up a detailed report on the paper, and posted it on the RECOMB review website together with poor scores reflecting my opinion that the paper had to be rejected. How could RECOMB, ostensibly the premier computer science conference on computational and algorithmic biology, publish a paper with neither a computational nor biological result? Not to mention an algorithm that demonstratably did not find the most frequently occurring motif.

As you might already guess, my rejection was subsequently overruled. I don’t know who made the final decision to accept the Grochow & Kellis paper to the RECOMB conference, although presumably it was the program committee chair. The decision jarred with my sense of scientific integrity. I had put considerable effort into reviewing the paper and understanding it, and I felt that I had provided a compelling objective argument for why the paper was fundamentally flawed- the fact that the results were trivial (and incorrect!) was not a subjective statement. At this point I need to point out that the RECOMB conference is quite difficult to get into. The acceptance rate for papers in 2007, consistent with other years, was 21.8%. I knew this meant that even a single very negative review, especially one with a compelling argument against the paper, almost certainly would lead to rejection of the paper. So I couldn’t understand then, nor do I still understand now, on what basis the decision was made to accept the paper. This bothered me greatly, and after much deliberation I started boycotting the conference. Despite publishing five RECOMB papers from 2000 to 2006 and regularly attending the meeting during that time, the continued poor decisions and haphazard standards for papers selected have led me to not return in almost 8 years.

Grochow and Kellis obviously received my review and considered how to “deal with it”. They added a section titled “The role of combinatorial effects”, in which they explained the origins of the number 27,720 that they gleaned from my report, but then spun the bad news they had received as “resulting from combinatorial connectivity patterns prevalent in larger network structures.” They then added that “…this combinatorial clustering effect brings into question the current definition of network motif” and proposed that “additional statistics…might well be suited to identify larger meaningful networks.” This is a lot like someone claiming to discover a bacteria whose DNA is arsenic-based and upon being told by others that the “discovery” is incorrect – in fact, that very bacteria seeks out phosphorous – responding that this is “really helpful” and that it “raises lots of new interesting open questions” about how arsenate gets into cells. Chutzpah. When you discover your work is flawed, the correct response is to retract it.

I don’t think people read papers very carefully. Joshua Grochow went on to win the MIT Charles and Jennifer Johnson Outstanding M. Eng. Thesis Award for his RECOMB work on network motif discovery. [Added February 18: Grochow and Kellis have posted a reply here].

The nature of man

I have to admit that after the Grochow-Kellis paper I was a bit skeptical of Kellis’ work. Not because of the paper itself (everyone makes mistakes), but because of the way he responded to my review. So a year and a half ago, when Manolis Kellis published a paper in an area I care about and am involved in, I may have had a negative prior. The paper was Luke Ward and Manolis Kellis “Evidence for Abundant and Purifying Selection in Humans for Recently Acquired Regulatory Functions”, Science 337 (2012) . Having been involved with the ENCODE pilot, where I contributed to the multiple alignment sub-project, I was curious what comparative genomics insights the full-scale $130 million dollar project revealed. The press releases accompanying the Ward-Kellis paper (e.g. The Nature of Man, The Economist) were suggesting that Ward and Kellis had figured out what makes a human a human; my curiosity was understandably piqued. Ward and Kellis combined population genomic data from the 1000 Genomes Project with biochemical data from the ENCODE project to look for signatures of human constraint in regulatory elements. Their analysis was based on measuring three different proxies for constraint: SNP density, heterozygosity and derived allele frequency. To identify specific classes of regulatory regions under constraint, aggregated regions associated with specific gene ontology (GO) categories were tested for significance. Reading the paper I was amazed to discover they found precisely two categories: retinal cone cell development and nerve growth factor receptor signaling. It was only upon reading the supplement that I discovered that their tests had produced 53 other GO categories as well (Table S5). Despite the fact that the listed categories were required to pass a false discovery rate (FDR) threshold for both the heterozygosity and derived allele frequency (DAF) measures, it was statistically invalid for them to highlight any specific GO category. FDR control merely guarantees a low false discovery rate among the entries in the entire list. Moreover, there was no obvious explanation for why categories such as chromatin binding (which had a smaller DAF than nerve growth) or protein binding (with the smallest p-value) appeared to be under purifying selection. As with the Feizi et al. paper, the supplement produced a story much less clean than the one presented in the main body of the paper. In fact, retinal cone cell development and nerve growth factor were 33 and 34 out of the 55 listed GO categories when sorted by the DAF p-value (42 and 54 when sorted by heterozygosity p-value). In other words, the story being sold in the paper was based on blatant statistically invalid cherry picking. The other result of the paper was an estimate that in addition to the 5% of the human genome conserved across mammalian genomes, at least another 4% has been subject to lineage-specific constraint. This result was based on adding up the estimates of constrained nucleotides from their Table S6 (using the derived allele frequency measure). These were calculated using a statistic that was computed as follows: for each one of ten bins determined according to estimated background selection strength, and for every feature F, the average DAF value DF was rescaled to $PUC_F = \frac{(D_F - D_{CNDC})}{(D_{NCNE}-D_{CNDC})}$, where DCNDC and DNCNE were the bin-specific average DAFs of conserved non-degenerate coding regions and non-conserved non-ENCODE regions respectively. One problem with the statistic is that the non-conserved regions contain nucleotides not conserved in all mammals, which is not the same as nucleotides not conserved in any mammals. The latter would have been needed in order to identify human specific constraint. Second, the statistic PUCF was used as a proxy for the proportion under constraint even though, as defined, it could be less than zero or greater than one. Indeed, in Table S6 there were four values among the confidence intervals for the estimated proportions using DAF that included values less than 0% or above 100%: Ward and Kellis were therefore proposing that some features might have a negative number of nucleotides under constraint. Moreover, while it is possible that after further rescaling PUCmight have correlated with the true proportion of nucleotides under constraint, there was no argument provided in the paper. Thus, while Ward and Kellis claimed to have estimated the proportion of nucleotides under constraint, they had only computed a statistic named “proportion under constraint”. Nicolas Bray and I wrote up these points in a short technical comment and submitted it to the journal Science early in November 2012. The comment was summarily rejected with a curt reply by senior editor Laura Zahn stating that “relative to other Technical Comments we have recently received we feel that the scope and focus of your comment make it more suitable for the Online Comments facility at Science, rather than as a candidate for publication as a Technical Comment.” It is worth noting that Science did decide to publish another comment: Phil Green and Brent Ewing’s, “Comment on’Evidence of Abundant and Purifying Selection in Humans for Recently Acquired Regulatory Functions‘”, Science 10 (2013). Green and Ewing’s comment is biological in nature. Their concern is that “… the polymorphism trends are primarily attributable to mutational variation and technical artifacts rather than selection.” Its fine that Science decided to host a debate on a biology question on its pages, but how can one debate the interpretation of results from a method, when the method is fundamentally flawed to begin with? After all, our problem with PUC was much deeper than a “technical flaw”. We decided at the end to place the comment in the arXiv. After doing so, it became apparent that it had little impact. Indeed, I have never received any feedback about it from anyone. Apparently even this was too much for Manolis Kellis. Methods matter By the time I noticed the Feizi et al. paper in the journal Nature Biotechnology early in August 2013, my experiences reading Kellis’ papers had subtly altered the dynamic between myself and the printed word. Usually, when I read a paper and I don’t understand something, I assume the fault lies with me. I think most people are like this. But now, when the Feizi et al. paper started to not make sense, I didn’t presume the problem was with me. I tried hard to give the paper a fair reading, but after a few paragraphs the spell of the authors was already broken. And so it is that Nicolas Bray and I came to figure out what was really going on in Feizi et al., a project that eventually led us to also look at Barzel-Barabási. Speaking frankly, it was difficult work to write the blog posts about these articles. In addition to the time it took, it was exhausting and exasperating to discover the flaws, fallacies and frauds. Both Nick and I prefer to do research. But we felt a responsibility to spell out in detail what had happened here. Manolis Kellis is not just any scientist. He has, and continues to play leading roles in major consortium projects such as mod-ENCODE and ENCODE, and he has served on numerous advisory committees for the NHGRI. He is a member of the GCAT (Genomics, Computational Biology and Technology) study section until 2018. That any person would swap out a key figure in a published paper without publishing a correction, and without informing the editor is astonishing. That a person with great responsibility towards scientists is an abuser of science is unacceptable. Manolis Kellis’ behavior is part of a systemic problem in computational biology. The cross-fertilization of ideas between mathematics, statistics, computer science and biology is both an opportunity and a danger. It is not hard to peddle incoherent math to biologists, many of whom are literally math phobic. For example, a number of responses I’ve received to the Feizi et al. blog post have started with comments such as “I don’t have the expertise to judge the math, …” Similarly, it isn’t hard to fool mathematicians into believing biological fables. Many mathematicians throughout the country were recently convinced by Jonathan Rothberg to donate samples of their DNA so that they might find out “what makes them a genius”. Such mathematicians, and their colleagues in computer science and statistics, take at face value statements such as “we have figured out what makes a human human”. In the midst of such confusion, it is easy for an enterprising “computational person” to take advantage of the situation, and Kellis has. I believe the solution for this problem is for computational biologists to start taking themselves more seriously. Whether serving as reviewers for journals, as panel members for funding agencies, on hiring/tenure committees, or writing articles, all of us have to tone down the hype and pay closer attention to the science. There are many examples of what this means: a review of a math/stats containing paper cannot be a single paragraph long and based on a hunch, and similarly computational biologists shouldn’t claim, as have many of the authors of papers I’ve reviewed in these posts, pathways to cure disease and explanations for what makes humans human. Don’t fool the biologists. Don’t fool the computer scientists, statisticians, and mathematicians. The possibilities for computational methods in biology are unlimited. The future is exciting, and there are possibilities for significant advances in areas ranging from molecular and evolutionary biology to medicine. But money, citations and fame cannot rule the day. The details of the #methodsmatter. This is the second in a series of three posts (part1, part3) on two back-to-back papers published in Nature Biotechnology in August 2013: 1. Baruch Barzel & Albert-László Barabási, Network link prediction by global silencing of indirect correlationsNature Biotechnology 31(8), 2013, p 720–725. doi:10.1038/nbt.2601 2. Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, Network deconvolution as a general method to distinguish direct dependencies in networksNature Biotechnology 31(8), 2013, p 726–733. doi:10.1038/nbt.2635 The Barzel & Barabási paper we discussed in yesterday’s blog post is embarrassing for its math, shoddy “validations” and lack of results. However of the two papers above it is arguably the “better” one. That is because the paper by Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, in addition to having similar types of problems to the Barzel & Barabási paper, is also dishonest and fraudulent. For reasons that I will explain in the third and final post in this series, we (Nicolas Bray and I) started looking at the Feizi et al. paper shortly after it was published early in August 2013. This is the story: Feizi et al.‘s paper describes a method called network deconvolution that in their own words provides “…a systematic method for inferring the direct dependencies in a network, corresponding to true interactions, and removing the effects of transitive relationships that result from indirect effects.” They claim that the method is a “foundational graph theoretic tool” and that it “is widely applicable for computing dependencies in network science across diverse disciplines.” This high brow language makes network deconvolution sounds very impressive, but what exactly is the method? Feizi et al. would like you to believe that the method is what is described in their Figure 1, part of which is shown below: This is a model for networks represented as matrices. In this model an unknown matrix $G_{dir}$ with eigenvalues between -1 and 1 is to be inferred from an observed matrix $G_{obs}$ that is related to $G_{dir}$ via $G_{obs} = G_{dir}+G_{dir}^2+G_{dir}^3+\cdots$ (1) The matrix $G_{dir}$ represents “direct effects” and the sum of its powers “indirect effects”. It is probably worth noting at this point that there is no particular reason to believe that effects will behave in this manner in any particular system of interest, but we return to this later. The eigenvalue assumption on $G_{dir}$ is required for the specification of the model, because without it the infinite sum generating $G_{obs}$ does not converge. An elementary manipulation of (1) shows that the matrix $G_{dir}$ can be recovered from $G_{obs}$ by the formula $G_{dir} = G_{obs}(I+G_{obs})^{-1}$. (2) Unfortunately, many matrices cannot be decomposed as an infinite sum of powers of some matrix as in equation (1), so equation (2) cannot be applied directly to arbitrary data matrices. The only mention of this issue in the main text of the paper is the statement that “This [eigenvalue] assumption can be achieved for any matrix by scaling the observed input network by a function of the magnitude of its eigenvalues.” This is true but incoherent. There are an infinite number of scalings that will satisfy the assumption, and while the authors claim in their FAQ that “the effect of linear scaling on the input matrix is that … it does not have an effect” this is obviously false (also if it has no effect why do they do it?). For example, as the scaling goes to zero, $G_{dir}$ converges to $G_{obs}$. What the authors have actually done with their seemingly innocuous aside is fundamentally change their model: now instead of (1), $G_{obs}$ is given by $G_{obs} = \gamma \cdot (G_{dir}+G_{dir}^2+G_{dir}^3+\cdots)$. (3) The problem with this model is that given $G_{obs}$ there are an infinite number of solutions for $\gamma$ and $G_{dir}$. Feizi et al.‘s approach to dealing with this is to introduce a scaling parameter that must be chosen a priori. They do not even mention the existence of this parameter anywhere in the main text. Insteadthey choose to make a false statement in the caption of Figure 1 when they write “When these assumptions hold, network deconvolution removes all indirect flow effects and infers all direct interactions and weights exactly.” Even when $G_{obs}$ satisfies the eigenvalue constraint, once it is scaled before applying equation (2) the matrix $G_{dir}$ has probability zero of being recovered. In the video below, recorded at the Banff workshop on Statistical Data Integration Challenges in Computational Biology: Regulatory Networks and Personalized Medicine (August 11–16, 2013), you can see an explanation of network deconvolution by the corresponding author Kellis himself. The first part of the clip is worth watching just to see Kellis describe inverting a matrix as a challenge and then explain the wrong way to do it. But the main point is at the end (best viewed full screen with HD): Manolis Kellis received his B.S., M.S. and Ph.D degrees in computer science and electrical engineering from MIT, so it is hard to believe that he really thinks that solving (2), which requires nothing more than a matrix inversion, must be accomplished via eigendecomposition. In fact, inverting a 2000 x 2000 matrix in R is 50% slower using that approach. What is going on is that Kellis is hiding the fact that computation of the eigenvalues is used in Feizi et al. in order to set the scaling parameter, i.e. that he is actually solving (3) and not (2). Indeed, there is no mention of scaling in the video except for the mysteriously appearing footnote in the lower left-hand corner of the slide starting at 0:36 that is flashed for 2 seconds. Did you have time to think through all the implications of the footnote in 2 seconds, or were you fooled? While it does not appear in the main text, the key scaling parameter in network deconvolution is acknowledged in the supplementary material of the paper (published with the main paper online July 14, 2013). In supplementary Figure 4, Feizi et al. show the performance of network deconvolution with different scaling parameters ($\beta$) on the three different datasets examined in the paper: Figure S4, July 14, 2013. In the words of the authors, the point of this figure is that “…choosing $\beta$ close to one (i.e., considering higher order indirect interactions) leads to the best performance in all considered network deconvolution applications.” However, while the supplement revealed the existence of $\beta$, it did not disclose the values used for the results in the paper. We inquired with the authors and were surprised to discover that while $\beta = 0.95$ was used for the protein networks and $\beta = 0.99$ for the co-authorship network, $\beta=0.5$ was used for the DREAM5 regulatory networks violating their own advice. What rationale could there be for such a different choice, especially one very far away from the apparently optimal choice “near 1”? We asked the authors, who initially replied that the parameter setting didn’t make a difference. We then asked why the parameter would be set differently if its value didn’t make a difference; we never got an answer to this question. Although it may be hard to believe, this is where the story gets even murkier. Perhaps as a result of our queries, the scaling parameters were publicly disclosed by Feizi et al. in a correction to the original supplement posted on the Nature Biotechnology website on August 26, 2013. The correction states “Clarification has been made to Supplementary Notes 1.1, 1.3, 1.6 and Supplementary Figure 4 about the practical implementation of network deconvolution and parameter selection for application to the examples used in the paper. ” But Supplementary Figure 4 was not clarified, it was changed, a change not even acknowledged by the authors, let alone explained. Below is Figure S4 from the updated supplement for comparison with the one from the original supplement shown above. Figure S4, August 26, 2013. The revised figure is qualitatively and quantitatively different from its predecessor. A key point of the paper was changed- the scaling parameter $\beta$ is now seen to not be optimal near 1, but rather dataset dependent (this is reflected in a corresponding change in the text: in reference to choosing $\beta$ close to 1 “best performance” was changed to “high performance”). Specifically, the regulatory network dataset now has an optimal value of $\beta$ close to 0.5, perhaps providing an answer to our question above about why $\beta=0.5$ was used for that dataset alone. However that explanation directly implies that the authors knew the original figure was incorrect. We are left with two questions: 1. If the authors believed their original figure and their original statement that $\beta$ close to 1 “leads to the best performance in all considered network deconvolution applications”, then why did they use $\beta=0.5$ for one of the applications? 2. When and how was the July 16th Figure S4 made? Manolis Kellis declined to answer question 1 and by not acknowledging the figure change in the correction did not permit the readers of Nature Biotechnology to ask question 2. Clearly we have fallen a long way from the clean and canonical-seeming Figure 1. We wish we could say we have reached the bottom, but there is still a ways to go. It turns out that “network deconvolution” is actually neither of the two models (2) and (3) above, a fact revealed only upon reading the code distributed by the authors. Here is what is in the code: 1. affinely map the entries of the matrix $G_{obs}$ to lie between 0 and 1, 2. set the diagonal of the matrix to 0, 3. threshold the matrix, keeping only the largest $\alpha$ fraction of entries, 4. symmetrize the matrix, 5. scale the matrix so that $G_{dir}$ inferred in the next step will have maximum eigenvalue $\beta$, 6. apply formula (2), 7. affinely map between 0 and 1 again. The parameter $\beta$ is the scaling parameter we’ve already discussed, but there is a second parameter $\alpha$. When the Feizi et al. paper was first published, the parameter $\alpha$ appeared in the code with default value 1, and was literally not mentioned in the paper. When asked, the authors revealed that it also takes on different values in the different experiments in the paper. In particular, the regulatory networks DREAM5 experiments used $\alpha=0.1$. The only guidance for how to set $\alpha$, now provided with the new version of the supplement, is “In practice, the threshold chosen for visualization and subsequent analysis will depend on the specific application.” It is hard to say what steps 1–7 actually do. While it is impossible to give a simple analytic formula for this procedure as a whole, using the Sherman-Morrison formula we found that when applied to a correlation matrix C, steps 1, 2, 4, 6 and 7 produce a matrix whose ijth entry is (up to an affine mapping) $P_{ij} + \Sigma_{kl}P_{ij}P_{kl}+m\Sigma_{kl}P_{ik}P_{jl}$ where $P=C^{-1}$ and m is the minimum entry of C. Omitting step 1 results in Pii , the inverse correlation matrix, so the effect of the mapping in this case is the addition of the final two terms whose possible meaning escapes us. We are left to wonder why the authors chose to add this step which lacks any apparent theoretical justification. This bring us to question the meaning of the contribution of author Muriel Médard, a professor of information theory at MIT. According to the contributions section she is the one who “contributed to correctness proof and robustness analysis”. Presumably “correctness” meant describing the two steps needed to show that the solution to (1) is (2). But what would actually be relevant to the paper is a theorem about steps 1–7. Unfortunately such a theorem is hard to imagine. When Feizi et al. was published in August 2013, they set up a companion website at http://compbio.mit.edu/nd/ where the code and data was made available (the data is no longer available at the site). On December 4, 2013, the authors updated the website. While there used to be one version of the program, there are now two different versions, one specifically for use on regulatory networks where the default values for the parameters are $\beta=0.5, \, \alpha =0.1$ and the other with defaults $\beta=0.9, \, \alpha=1$“Network deconvolution” is supposedly a universal method applicable to any kind of network. Will the authors continue to distribute different domain-specific versions of the program, each with their own “default” parameters somehow divined by themselves? To summarize, “network deconvolution”, as best as we can currently tell, consists of steps 1–7 above with the parameters: • DREAM5 challenge: $\beta = 0.5$ and $\alpha=0.1$. • Protein network: $\beta = 0.99$ and $\alpha=1$. • Co-authorship: $\beta=0.95$ and $\alpha=1$. The description of the method in the main text of the paper is different from what is in the online methods, which in turn is different from what is in the supplement, which in turn is different from what is actually in each of the (two) released programs, namely the ad-hoc heuristic steps we have described above. Having said all of that, one might be curious whether the “method” actually works in practice. Sometimes heuristics work well for reasons that are not well understood (although they are usually not published in journals such as Nature Biotechnology). Unfortunately, we have to disclose that we don’t know how the method performs on the datasets in the paper. We tried really hard to replicate the results of the paper, running the software on the provided input matrices and comparing them to the distributed output matrices (available on the NBT website). Despite our best attempts, and occasionally getting very close results, we have been unable to replicate the results of the paper. This bothers us to such an extent that I, Lior Pachter, am hereby offering$100 to anyone (including Feizi et al.) who can provide me with the code and input to run network deconvolution and produce exactly the figures in the paper (including both versions of Figure S4) and the output matrices in the supplementary data.

Because we couldn’t replicate the results of the paper, we decided to examine how FK performs in one case for which Feizi et al. originally claimed (Supplement version July 14 2013) that their method is optimal. They stated “if the observed network is a covariance matrix of jointly Gaussian variables, ND infers direct interactions using global partial correlations”. This claim was deleted without mention in the updated supplement, however even if network deconvolution is not optimal for inferring a network from a correlation matrix (presumably what they meant in the original statement), the claimed universality suggested in the paper implies that it should be very good on any input.

The figure above shows a comparison of the Feizi et al. method (FK) with the various beta parameters used in their paper to regularized partial correlation, the “geometric root” (equation (2)) and the naïve method of simply selecting the top entries of the correlation matrix. In order to test the method we performed 40 simulations of sampling 500 observations from a random Gaussian graphical model with 1000 variables and an edge density of 5% to ensure the graph was connected yet sparse. Performance was assessed by comparing the ranking of the edges to the true edges present in the model and computing the area under the corresponding ROC curve. As a benchmark we applied partial correlation structural inference based on a James-Stein shrinkage estimation of the covariance matrix (a standard approach to gene regulatory network inference, described in Schäfer, J. & Strimmer, K. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics. Statistical Applications in Genetics and Molecular Biology 4, (2005)).

That FK does better than the most naïve method and yet worse than a method actually designed for this task is perhaps to be expected for a heuristic method that is based on a metaphor (one that dates back to Seawall Wright’s work on path coefficients in the 1920s- although Wright used (1) in the restricted setting of directed acyclic graphs where it makes sense). It is possible that in some contexts, FK will make things somewhat better. However what such contexts might be is unclear. The only guidance that Feizi et al. provide on the assumptions needed to run their method is that they state in the supplement that the model assumes that networks are “linear time-invariant flow-preserving operators”. While it is true that individual parts of that phrase mean something in certain contexts, the complete phrase is word salad. We wonder: is co-authorship flow preserving?

To conclude: Feizi et al. have written a paper that appears to be about inference of edges in networks based on a theoretically justifiable model but

1. the method used to obtain the results in the paper is completely different than the idealized version sold in the main text of the paper and
2. the method actually used has parameters that need to be set, yet no approach to setting them is provided. Even worse,
3. the authors appear to have deliberately tried to hide the existence of the parameters. It looks like
4. the reason for covering up the existence of parameters is that the parameters were tuned to obtain the results. Moreover,
5. the results are not reproducible. The provided data and software is not enough to replicate even a single figure in the paper. This is disturbing because
6. the performance of the method on the simplest of all examples, a correlation matrix arising from a Gaussian graphical model, is poor.

In academia the word fraudulent is usually reserved for outright forgery. However given what appears to be deliberate hiding, twisting and torturing of the facts by Feizi et al., we think that fraud (“deception deliberately practiced in order to secure unfair or unlawful gain”) is a reasonable characterization. If the paper had been presented in a straightforward manner, would it have been accepted by Nature Biotechnology?

Post scriptum. After their paper was published, Feizi et al. issued some press releases. They explain how original and amazing their work is. Kellis compliments himself by explaining that “Introducing such a foundational operation on networks seems surprising in this day and age” and Médard adds that “the tool can be applied to networks of arbitrary dimension.” They describe the method itself as a way to have “…expressed the matrix representing all element correlations as a function of its principal components, and corresponding weights for each component” even though principal components has nothing to do with anything in their paper. As with Barzel-Barabási, this seems worthy of nomination for the Pressies.

Post post scriptum: In September 2013, we submitted a short commentary to Nature Biotechnology with many of the points made above. While it was rejected four months later, we really really wish we could share the reviews here. Unfortunately they were sent to us with a confidentiality restriction, though having said that, any of the reviewers are hereby invited to guest post on network science on this blog. We were disappointed that Nature Biotechnology would not publish the commentary, but not surprised. Frankly, it would have been too much to expect them and their reviewers to unravel the layers of deception in Feizi et al. in the initial review process; it took us an extraordinary amount of time and effort. Having published the paper, and without a clear path to unpublication (retraction is usually reserved for faking of experimental data), there were presumably not many options. The question is whether anything will change. Feizi et al. apparently have more Nature Biotechnology papers on the way as evidenced by an extraordinary declaration on Soheil Feizi’s publication page (this is a pdf of the version when this blog was posted; Feizi changed his site an hour later) where the publication venue of their next paper appears to have been preordained:

S. Feizi, G. Quon , M. Mendoza, M. Medard, M. Kellis
Comparative Analysis of Modular Integrative Regulatory Networks across Fly, Worm and Human
in preparation for Nature Biotechnology.

The term “in preparation” is common on academic websites, especially on sites of graduate students or postdocs who are eager to advertise forthcoming work, but “in preparation for” really requires some hubris. We have to wonder, why do Feizi et al. assume that peer review is just a formality in the journal Nature Biotechnology?

[Addendum 2/23: The authors posted a rebuttal to this blog post, which we replied to here.]

In the August 2013 issue of Nature Biotechnology there were two back-to-back methods papers published in the area of network theory:

1. Baruch Barzel & Albert-László Barabási, Network link prediction by global silencing of indirect correlationsNature Biotechnology 31(8), 2013, p 720–725. doi:10.1038/nbt.2601.
2. Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, Network deconvolution as a general method to distinguish direct dependencies in networksNature Biotechnology 31(8), 2013, p 726–733. doi:10.1038/nbt.2635.

This post is the first of a trilogy (part2, part3) in which my student Nicolas Bray and I tell the story of these papers and why we took the time to read them and critique them.

We start with the Barzel-Barabási paper that is about the applications of a model proposed by Barzel and his Ph.D. advisor, Ofer Biham (although all last names start with a B, Biham is not to be confused with Barabási):

In order to quantify connectivity in biological networks, Barzel and Biham proposed an experimental perturbation model in the paper Baruch Barzel & Ofer Biham, Quantifying the connectivity of a network: The network correlation function methodPhys. Rev. E 80, 046104 (2009) that forms the basis for network link prediction in Barzel-Barabási. In the context of biology, link prediction refers to the problem of identifying functional links between genes from data that may be confounded by indirect effects. For example, if gene A inhibits the expression of gene B, and also gene B inhibits the expression of gene C, then if the expression of A increases, it will decrease the expression of B, which in turn increase C. Therefore one might observe correlation in the expression levels of gene A and C, even though there is no direct interaction between them. The Barzel-Biham model is based on perturbation experiments. Assuming that a system of genes is in equilibrium, it is a model for the change in expression of one gene in response to a small perturbation in another.

The parameters in the Barzel-Biham model are entries in what they call a “local response matrix” S (any matrix with $S_{ii}=0$ for all i). Physical arguments pertaining to perturbations at equilibrium lead to the equations

$G=SG \,$ (off the diagonal) and $G_{ii}=1$ for all i.                                  (1)

for a “global response matrix” G that can, in principle, be observed and used to infer the matrix S. The innovation of Barzel and Barabási is to provide an approximate formula for recovering S from G, specifically the formula

$S \approx (G-I+D((G-I)G))^{-1}$                                                       (2)

where D(M) denotes the operation setting off-diagonal elements of M to zero. A significant part of the paper is devoted to showing that the approximation (2) is good. Then they suggest that (2) can be used to infer direct causal links in regulatory networks from collections of expression experiments. Barzel and Barabási claim that the approximation formula (2) is necessary because exact inference of S from G requires solving the intractable system of $n^2$ equations

$G=SG \,$ (off the diagonal) and $S_{ii}=0$ for all i.                                  (3)

The assertion of intractability is based on the claim that the equations are coupled. They reason that since the naïve matrix inversion algorithm requires $O(m^3)$ operations for m equations, the solution of (3) would require time $O(n^6)$. When we looked at this system, our first thought was that while it is large, it is also structured. We sat down and started examining it by writing down the equations for a simple case: a matrix S for a graph on 3 nodes. We immediately noticed the equations decoupled into n systems of n equations where system is given by $S_{ii}=0$ and $G_{ij} = \sum_k S_{ik} G_{kj}, \, (j \neq i)$, with the n unknowns $S_{i1},\ldots,S_{in}$. This immediately reduces the complexity to $O(n^4)$, or even $O(n^3)$ by simple parallelization. In other words, the system is trivially tractable.

But there is more: while looking at the paper I had to take a quick bathroom break, and by the time I returned Nick had realized he could apply the Sherman-Morrison formula to obtain the following formula for the exact solution:

$S=I-D(1/G^{-1})G^{-1}$.                                                                    (4)

Here the operator “/” denotes element-wise division, a simple operation to execute, so that inferring S from G requires no more than inverting G and scaling it, a formula that is also much simpler and more efficient to compute than (2). [Added 2/23: Jordan Ellenberg  pointed out the obvious fact that $G=SG$ off the diagonal means that $SG=G-D$ for some diagonal matrix $D$, and therefore $S=I-DG^{-1}$ and since the diagonal entries of $S$ must be zero it follows that $D=D(1/G^{-1})$. In other words, the Sherman-Morrison formula is not even needed]. While it would be nice for us to claim that our managing to quickly supersede the main result of a paper published in Nature Biotechnology was due to some sort of genius, in fact the entire exercise would be suitable for an undergraduate linear algebra homework problem. Barabási likes to compare himself to the great physicist and nobel laureate Subrahmanyan Chandrasekhar, but it is difficult to imagine the great Chandrasekhar having similar difficulties.

The approach to solving (4) has an implication that is even more important than the solution itself. It provides a dual formula for computing G from S according to (1), i.e. to simulate from the model. Using the same ideas as above, one finds that

$G=TD(1/T)$ where $T=(S-I)^{-1}$.                                              (5)

Unlike Barzel & Barabási that resorted to simulating with Michaelis-Menten dynamics in their study of performance of their approximation, using (4) we can efficiently simulate data directly from the model. One issue with Michaelis-Menten dynamics is that they make more sense for enzymatic networks as opposed to regulatory networks (for more on this see Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networksNat Rev Mol Cell Biol 9, 770–780 (2008)), but in any case performance on such dynamics is hardly a validation of (2) since its mixing apples and oranges. So what happens when one simulates data from the Barzel-Biham model and then tries to recover the parameters?

A comparison of the standard method of regularized partial correlations with exact inference for the Barzel-Biham model. Random sparse graphs were generated according to the Erdös-Renyi graph model G(5000,p) where was varied to assess performance at different graph densities (shown on x-axis) The y-axis shows the average AUROC obtained from 75 random trials at each density.

When examining simulations from the Barzel-Biham model with graphs on 5,000 nodes (see Figure above), we were surprised to discover that when adding even small amounts of noise, the exact algorithm (4) failed to recover the local response matrix from G (we also analyzed the approximation (3) and observed that it always resulted in performance inferior to (4), and that 5% of the time the correlation with the exact solution was negative). This sensitivity to noise is due to the term $D(1/G^{-1})$ in the exact formula which becomes problematic if the diagonal entries of $G^{-1}$ are close to zero.

Some intuition for the behavior of $G^{-1}$ may be gained from noting that if S is such that its geometric sum converges, the diagonal of $G^{-1}$ is equal to that of $I+S+S^2+S^3+\cdots$. If S has mixed signs and there is significant feedback within the network, the diagonal of $G^{-1}$ may be close to zero and any noise in the measurement of G could create very large fluctuations in the inferred S. This means that the results in Figure 1 are not dependent on the graph model chosen (Erdös-Renyi) and will occur for any reasonable model of gene regulatory networks including the modeling of both enhancers and repressors. From Figure 2a in their paper, it appears that Barzel and Barabási used in their simulation an S with only positive entries that would preclude such effects. Such an assumption is biologically unrealistic.

However, the difficulties with noise for the Barzel-Biham model go much deeper. While a constant signal-to-noise ratio, as assumed by Barzel and Barabasi, is a commonly used model for errors in experiments, it is important to remember that there is no experiment for directly measuring the elements of G. Obtaining $G_{ij}$ from an experiment is done by making a small perturbation of size e to gene i, observing the change in gene j, and then dividing that change by e. This last step increases the noise on the estimate of $G_{ij}$ by a factor of 1/e (a large number, for a perturbative experiment) above the noise already present in the measurements. Increasing e acts to remove the system from the perturbative regime and thereby increases the intrinsic error in estimating G.  It is therefore the case that attaining reasonable error on G will require very low noise in the original measurements. In this case of biological networks this would mean performing many replicates of the experiments. However, as Barzel & Barabási acknowledge in their paper, even a single replicate of a perturbation experiment is not currently feasible.

While the exact algorithm (4) for inverting the Barzel-Biham model performs poorly, we found that a widely used shrinkage method based on partial correlation (Schäfer, J. & Strimmer, K. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional GenomicsStatistical Applications in Genetics and Molecular Biology 4, (2005)) outperforms the exact algorithm (blue curve in Figure above). This suggests that there is no input for which (4) might be useful. The method is not even ideal for inference from data generated by the model it is based on.

This brings us to the “results” section of the paper. To demonstrate their method called Silencer, Barzel & Barabási ran it on only one of three datasets from the DREAM5 data. They then compared the performance of Silencer to three out of thirty five methods benchmarked in DREAM5. The Barzel-Biham model is for perturbation experiments, but Barzel & Barabási just threw in data from another universe (e.g. mutual information matrices). But lets just go with that for a moment. Their results are shown in the figure below:

Figure 3 from Barzel-Barabási.

The three methods tested potential improvements on are Pearson, Spearman and Mutual Information. Pearson and Spearman rank 16/35 and 18/35 respectively in the DREAM5 benchmarks. There may be some reason why Silencer applied on top of these methods improves performance: in the case where G is a correlation matrix, the path interpretation given by Barzel and Barabási connects the inference procedure to Seawall Wright’s path coefficients (ca. 1920), which in turn suggests an interpretation in terms of partial correlation. However in the case of mutual information, a method that is ranked 19/35 in the DREAM5 benchmarks, there is no statistically significant improvement at all. The improvement is from an AUROC of 0.67 to 0.68. Amazingly, Barzel and Barabási characterize these results by remarking that they “improve upon the top-performing inference methods” (emphasis on top is ours).  Considering that the best of these rank 16/35 the use of the  word “top” seems, shall we say, unconventional.

We have to ask: how did Barzel & Barabási get to publish a paper in the journal Nature Biotechnology on regulatory network inference without improvement or testing on anything but a handful of mediocre DREAM5 methods from a single dataset? To put the Barzel-Barabási results in context, it is worth considering the standards the Feizi et al. paper were held to. In that paper the authors compared their method to DREAM5 data as well, except they tested on all 3 datasets and 9 methods (and even on a community based method). We think its fair to conclude that significantly more testing would have to be done to argue that Silencer improves on existing methods for biological network inference.

We therefore don’t see any current practical utility for the Barzel-Biham model, except possibly for perturbation experiments in small sub-networks. Even then, we don’t believe it is practical to perform the number of experiments that would be necessary to overcome signal to noise problems.

Unfortunately the problems in Barzel-Barabási spill over into a follow up article published by the duo: Barzel, Baruch, and Albert-László Barabási, “Universality in Network Dynamics.” Nature Physics 9 (2013). In the paper they assume that the local response matrix has entries that are all positive, i.e. they do not allow for inhibitory interactions. Such a restriction immediately renders the results of the paper, if they are to be believed, moot in terms of biological significance. Moreover, the restrictions on S appear to be imposed in order to provide approximations to G that are unnecessary in light of (5). Given these immediate issues, we suspect that were we to read the Universality paper carefully, it is quite likely this post would have to be lengthened considerably.

These are not the first of Barabási’s papers to package meaningless and incoherent results in Nature/Science publications. In fact, there is a long history of Barabási publishing with fanfare in top journals only to have others respond by publishing technical comments on his papers, in many cases refuting completely the claims he makes. Unfortunately many of the critiques are not well known because they are rejected from the journals where Barabási is successful, and instead find their way to preprint servers or more specialized publications. Here is a partial list of Barabási finest and the response(s):

1. Barabasi is famous for the “BA model”, proposed in Barabási and Albert ‘Emergence of Scaling in Random Networks“, Science, Vol. 286 15 October 1999, pp. 509-512. Lada Adamic and Bernardo Huberman immediately refuted the practical applications of the model. Moreover, as pointed out by Willinger, Alderson and Doyle, while it is true that scale-free networks exhibit some interesting mathematical properties (specifically they are resilient to random attack yet vulnerable to worst-case), even the math was not done by Barabási but by the combinatorialists Bollobás and Riordan.
2. Barabási has has repeatedly claimed metabolic networks as prime examples of scale-free networks, starting with the paper Jeong et al., “The large-scale organization of metabolic networks“, Nature 407 (2000). This fact has been disputed and refuted in the paper Scale Rich Metabolic Networks by Reiko Tanaka.
3. The issue of attack tolerance was the focus of Error and attack tolerance of complex networks by Réka Albert, Hawoong Jeong & Albert-László Barabási in Nature 406 (2000). John Doyle refuted the paper completely in this paper.
4. In the paper “The origin of bursts and heavy tails in human dynamics” published in Nature 435 (2005) Barabási pretends to offer insights into the “bursty nature of human behavior” (by analyzing e-mail). In a follow up comment Daniel Stouffer, Dean Malmgren and Luis Amaral demonstrate that the reported power-law distributions are solely an artifact of the analysis of the empirical data and the proposed model is not representative of e-mail communication patterns.
5. Venturing into the field of control theory, the paper “Controllability of complex networks” by Liu, Slotine and Barabási, Nature 473 (2011) argues that “sparse inhomogeneous networks, which emerge in many real complex systems, are the most difficult to control, but that dense and homogeneous networks can be controlled using a few driver nodes.” Not so. In a beautiful and strong rebuttal, Carl Bergstrom and colleagues show that a single control input applied to the power dominating set is all that is required for structural controllability of most, if not all networks. I have also blogged about this particular paper previously, explaining the Bergstrom result and why it reveals the Barabási paper to be a control theory embarrassment.

In other words, Barabási’s “work” is a regular feature in the journals Nature and Science despite the fact that many eminent scientists keep demonstrating that the network emperor has no clothes.

Post scriptum. After their paper was published, Barzel and Barabási issued a press release claiming that “their research moves the team a step closer in its quest to under­stand, pre­dict, and con­trol human disease.” The advertisment seems like an excellent candidate for Michael Eisen’s pressies.