This is the second in a series of three posts (part1, part3) on two back-to-back papers published in Nature Biotechnology in August 2013:

1. Baruch Barzel & Albert-László Barabási, Network link prediction by global silencing of indirect correlationsNature Biotechnology 31(8), 2013, p 720–725. doi:10.1038/nbt.2601
2. Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, Network deconvolution as a general method to distinguish direct dependencies in networksNature Biotechnology 31(8), 2013, p 726–733. doi:10.1038/nbt.2635

The Barzel & Barabási paper we discussed in yesterday’s blog post is embarrassing for its math, shoddy “validations” and lack of results. However of the two papers above it is arguably the “better” one. That is because the paper by Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis, in addition to having similar types of problems to the Barzel & Barabási paper, is also dishonest and fraudulent. For reasons that I will explain in the third and final post in this series, we (Nicolas Bray and I) started looking at the Feizi et al. paper shortly after it was published early in August 2013. This is the story:

Feizi et al.‘s paper describes a method called network deconvolution that in their own words provides “…a systematic method for inferring the direct dependencies in a network, corresponding to true interactions, and removing the effects of transitive relationships that result from indirect effects.” They claim that the method is a “foundational graph theoretic tool” and that it “is widely applicable for computing dependencies in network science across diverse disciplines.” This high brow language makes network deconvolution sounds very impressive, but what exactly is the method?

Feizi et al. would like you to believe that the method is what is described in their Figure 1, part of which is shown below: This is a model for networks represented as matrices. In this model an unknown matrix $G_{dir}$ with eigenvalues between -1 and 1 is to be inferred from an observed matrix $G_{obs}$ that is related to $G_{dir}$ via $G_{obs} = G_{dir}+G_{dir}^2+G_{dir}^3+\cdots$                         (1)

The matrix $G_{dir}$ represents “direct effects” and the sum of its powers “indirect effects”. It is probably worth noting at this point that there is no particular reason to believe that effects will behave in this manner in any particular system of interest, but we return to this later. The eigenvalue assumption on $G_{dir}$ is required for the specification of the model, because without it the infinite sum generating $G_{obs}$ does not converge. An elementary manipulation of (1) shows that the matrix $G_{dir}$ can be recovered from $G_{obs}$ by the formula $G_{dir} = G_{obs}(I+G_{obs})^{-1}$.                                    (2)

Unfortunately, many matrices cannot be decomposed as an infinite sum of powers of some matrix as in equation (1), so equation (2) cannot be applied directly to arbitrary data matrices. The only mention of this issue in the main text of the paper is the statement that “This [eigenvalue] assumption can be achieved for any matrix by scaling the observed input network by a function of the magnitude of its eigenvalues.” This is true but incoherent. There are an infinite number of scalings that will satisfy the assumption, and while the authors claim in their FAQ that “the effect of linear scaling on the input matrix is that … it does not have an effect” this is obviously false (also if it has no effect why do they do it?). For example, as the scaling goes to zero, $G_{dir}$ converges to $G_{obs}$.

What the authors have actually done with their seemingly innocuous aside is fundamentally change their model: now instead of (1), $G_{obs}$ is given by $G_{obs} = \gamma \cdot (G_{dir}+G_{dir}^2+G_{dir}^3+\cdots)$.             (3)

The problem with this model is that given $G_{obs}$ there are an infinite number of solutions for $\gamma$ and $G_{dir}$. Feizi et al.‘s approach to dealing with this is to introduce a scaling parameter that must be chosen a priori. They do not even mention the existence of this parameter anywhere in the main text. Insteadthey choose to make a false statement in the caption of Figure 1 when they write “When these assumptions hold, network deconvolution removes all indirect flow effects and infers all direct interactions and weights exactly.” Even when $G_{obs}$ satisfies the eigenvalue constraint, once it is scaled before applying equation (2) the matrix $G_{dir}$ has probability zero of being recovered.

In the video below, recorded at the Banff workshop on Statistical Data Integration Challenges in Computational Biology: Regulatory Networks and Personalized Medicine (August 11–16, 2013), you can see an explanation of network deconvolution by the corresponding author Kellis himself. The first part of the clip is worth watching just to see Kellis describe inverting a matrix as a challenge and then explain the wrong way to do it. But the main point is at the end (best viewed full screen with HD):

Manolis Kellis received his B.S., M.S. and Ph.D degrees in computer science and electrical engineering from MIT, so it is hard to believe that he really thinks that solving (2), which requires nothing more than a matrix inversion, must be accomplished via eigendecomposition. In fact, inverting a 2000 x 2000 matrix in R is 50% slower using that approach. What is going on is that Kellis is hiding the fact that computation of the eigenvalues is used in Feizi et al. in order to set the scaling parameter, i.e. that he is actually solving (3) and not (2). Indeed, there is no mention of scaling in the video except for the mysteriously appearing footnote in the lower left-hand corner of the slide starting at 0:36 that is flashed for 2 seconds. Did you have time to think through all the implications of the footnote in 2 seconds, or were you fooled?

While it does not appear in the main text, the key scaling parameter in network deconvolution is acknowledged in the supplementary material of the paper (published with the main paper online July 14, 2013). In supplementary Figure 4, Feizi et al. show the performance of network deconvolution with different scaling parameters ( $\beta$) on the three different datasets examined in the paper:

In the words of the authors, the point of this figure is that “…choosing $\beta$ close to one (i.e., considering higher order indirect interactions) leads to the best performance in all considered network deconvolution applications.” However, while the supplement revealed the existence of $\beta$, it did not disclose the values used for the results in the paper.

We inquired with the authors and were surprised to discover that while $\beta = 0.95$ was used for the protein networks and $\beta = 0.99$ for the co-authorship network, $\beta=0.5$ was used for the DREAM5 regulatory networks violating their own advice. What rationale could there be for such a different choice, especially one very far away from the apparently optimal choice “near 1”? We asked the authors, who initially replied that the parameter setting didn’t make a difference. We then asked why the parameter would be set differently if its value didn’t make a difference; we never got an answer to this question.

Although it may be hard to believe, this is where the story gets even murkier.

Perhaps as a result of our queries, the scaling parameters were publicly disclosed by Feizi et al. in a correction to the original supplement posted on the Nature Biotechnology website on August 26, 2013. The correction states “Clarification has been made to Supplementary Notes 1.1, 1.3, 1.6 and Supplementary Figure 4 about the practical implementation of network deconvolution and parameter selection for application to the examples used in the paper. ” But Supplementary Figure 4 was not clarified, it was changed, a change not even acknowledged by the authors, let alone explained. Below is Figure S4 from the updated supplement for comparison with the one from the original supplement shown above.

The revised figure is qualitatively and quantitatively different from its predecessor. A key point of the paper was changed- the scaling parameter $\beta$ is now seen to not be optimal near 1, but rather dataset dependent (this is reflected in a corresponding change in the text: in reference to choosing $\beta$ close to 1 “best performance” was changed to “high performance”). Specifically, the regulatory network dataset now has an optimal value of $\beta$ close to 0.5, perhaps providing an answer to our question above about why $\beta=0.5$ was used for that dataset alone. However that explanation directly implies that the authors knew the original figure was incorrect. We are left with two questions:

1. If the authors believed their original figure and their original statement that $\beta$ close to 1 “leads to the best performance in all considered network deconvolution applications”, then why did they use $\beta=0.5$ for one of the applications?

2. When and how was the July 16th Figure S4 made?

Manolis Kellis declined to answer question 1 and by not acknowledging the figure change in the correction did not permit the readers of Nature Biotechnology to ask question 2.

Clearly we have fallen a long way from the clean and canonical-seeming Figure 1. We wish we could say we have reached the bottom, but there is still a ways to go. It turns out that “network deconvolution” is actually neither of the two models (2) and (3) above, a fact revealed only upon reading the code distributed by the authors. Here is what is in the code:

1. affinely map the entries of the matrix $G_{obs}$ to lie between 0 and 1,
2. set the diagonal of the matrix to 0,
3. threshold the matrix, keeping only the largest $\alpha$ fraction of entries,
4. symmetrize the matrix,
5. scale the matrix so that $G_{dir}$ inferred in the next step will have maximum eigenvalue $\beta$,
6. apply formula (2),
7. affinely map between 0 and 1 again.

The parameter $\beta$ is the scaling parameter we’ve already discussed, but there is a second parameter $\alpha$. When the Feizi et al. paper was first published, the parameter $\alpha$ appeared in the code with default value 1, and was literally not mentioned in the paper. When asked, the authors revealed that it also takes on different values in the different experiments in the paper. In particular, the regulatory networks DREAM5 experiments used $\alpha=0.1$. The only guidance for how to set $\alpha$, now provided with the new version of the supplement, is “In practice, the threshold chosen for visualization and subsequent analysis will depend on the specific application.”

It is hard to say what steps 1–7 actually do. While it is impossible to give a simple analytic formula for this procedure as a whole, using the Sherman-Morrison formula we found that when applied to a correlation matrix C, steps 1, 2, 4, 6 and 7 produce a matrix whose ijth entry is (up to an affine mapping) $P_{ij} + \Sigma_{kl}P_{ij}P_{kl}+m\Sigma_{kl}P_{ik}P_{jl}$ where $P=C^{-1}$ and m is the minimum entry of C. Omitting step 1 results in Pii , the inverse correlation matrix, so the effect of the mapping in this case is the addition of the final two terms whose possible meaning escapes us. We are left to wonder why the authors chose to add this step which lacks any apparent theoretical justification. This bring us to question the meaning of the contribution of author Muriel Médard, a professor of information theory at MIT. According to the contributions section she is the one who “contributed to correctness proof and robustness analysis”. Presumably “correctness” meant describing the two steps needed to show that the solution to (1) is (2). But what would actually be relevant to the paper is a theorem about steps 1–7. Unfortunately such a theorem is hard to imagine.

When Feizi et al. was published in August 2013, they set up a companion website at http://compbio.mit.edu/nd/ where the code and data was made available (the data is no longer available at the site). On December 4, 2013, the authors updated the website. While there used to be one version of the program, there are now two different versions, one specifically for use on regulatory networks where the default values for the parameters are $\beta=0.5, \, \alpha =0.1$ and the other with defaults $\beta=0.9, \, \alpha=1$“Network deconvolution” is supposedly a universal method applicable to any kind of network. Will the authors continue to distribute different domain-specific versions of the program, each with their own “default” parameters somehow divined by themselves?

To summarize, “network deconvolution”, as best as we can currently tell, consists of steps 1–7 above with the parameters:

• DREAM5 challenge: $\beta = 0.5$ and $\alpha=0.1$.
• Protein network: $\beta = 0.99$ and $\alpha=1$.
• Co-authorship: $\beta=0.95$ and $\alpha=1$.

The description of the method in the main text of the paper is different from what is in the online methods, which in turn is different from what is in the supplement, which in turn is different from what is actually in each of the (two) released programs, namely the ad-hoc heuristic steps we have described above.

Having said all of that, one might be curious whether the “method” actually works in practice. Sometimes heuristics work well for reasons that are not well understood (although they are usually not published in journals such as Nature Biotechnology). Unfortunately, we have to disclose that we don’t know how the method performs on the datasets in the paper. We tried really hard to replicate the results of the paper, running the software on the provided input matrices and comparing them to the distributed output matrices (available on the NBT website). Despite our best attempts, and occasionally getting very close results, we have been unable to replicate the results of the paper. This bothers us to such an extent that I, Lior Pachter, am hereby offering \$100 to anyone (including Feizi et al.) who can provide me with the code and input to run network deconvolution and produce exactly the figures in the paper (including both versions of Figure S4) and the output matrices in the supplementary data.

Because we couldn’t replicate the results of the paper, we decided to examine how FK performs in one case for which Feizi et al. originally claimed (Supplement version July 14 2013) that their method is optimal. They stated “if the observed network is a covariance matrix of jointly Gaussian variables, ND infers direct interactions using global partial correlations”. This claim was deleted without mention in the updated supplement, however even if network deconvolution is not optimal for inferring a network from a correlation matrix (presumably what they meant in the original statement), the claimed universality suggested in the paper implies that it should be very good on any input. The figure above shows a comparison of the Feizi et al. method (FK) with the various beta parameters used in their paper to regularized partial correlation, the “geometric root” (equation (2)) and the naïve method of simply selecting the top entries of the correlation matrix. In order to test the method we performed 40 simulations of sampling 500 observations from a random Gaussian graphical model with 1000 variables and an edge density of 5% to ensure the graph was connected yet sparse. Performance was assessed by comparing the ranking of the edges to the true edges present in the model and computing the area under the corresponding ROC curve. As a benchmark we applied partial correlation structural inference based on a James-Stein shrinkage estimation of the covariance matrix (a standard approach to gene regulatory network inference, described in Schäfer, J. & Strimmer, K. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics. Statistical Applications in Genetics and Molecular Biology 4, (2005)).

That FK does better than the most naïve method and yet worse than a method actually designed for this task is perhaps to be expected for a heuristic method that is based on a metaphor (one that dates back to Seawall Wright’s work on path coefficients in the 1920s- although Wright used (1) in the restricted setting of directed acyclic graphs where it makes sense). It is possible that in some contexts, FK will make things somewhat better. However what such contexts might be is unclear. The only guidance that Feizi et al. provide on the assumptions needed to run their method is that they state in the supplement that the model assumes that networks are “linear time-invariant flow-preserving operators”. While it is true that individual parts of that phrase mean something in certain contexts, the complete phrase is word salad. We wonder: is co-authorship flow preserving?

To conclude: Feizi et al. have written a paper that appears to be about inference of edges in networks based on a theoretically justifiable model but

1. the method used to obtain the results in the paper is completely different than the idealized version sold in the main text of the paper and
2. the method actually used has parameters that need to be set, yet no approach to setting them is provided. Even worse,
3. the authors appear to have deliberately tried to hide the existence of the parameters. It looks like
4. the reason for covering up the existence of parameters is that the parameters were tuned to obtain the results. Moreover,
5. the results are not reproducible. The provided data and software is not enough to replicate even a single figure in the paper. This is disturbing because
6. the performance of the method on the simplest of all examples, a correlation matrix arising from a Gaussian graphical model, is poor.

In academia the word fraudulent is usually reserved for outright forgery. However given what appears to be deliberate hiding, twisting and torturing of the facts by Feizi et al., we think that fraud (“deception deliberately practiced in order to secure unfair or unlawful gain”) is a reasonable characterization. If the paper had been presented in a straightforward manner, would it have been accepted by Nature Biotechnology?

Post scriptum. After their paper was published, Feizi et al. issued some press releases. They explain how original and amazing their work is. Kellis compliments himself by explaining that “Introducing such a foundational operation on networks seems surprising in this day and age” and Médard adds that “the tool can be applied to networks of arbitrary dimension.” They describe the method itself as a way to have “…expressed the matrix representing all element correlations as a function of its principal components, and corresponding weights for each component” even though principal components has nothing to do with anything in their paper. As with Barzel-Barabási, this seems worthy of nomination for the Pressies.

Post post scriptum: In September 2013, we submitted a short commentary to Nature Biotechnology with many of the points made above. While it was rejected four months later, we really really wish we could share the reviews here. Unfortunately they were sent to us with a confidentiality restriction, though having said that, any of the reviewers are hereby invited to guest post on network science on this blog. We were disappointed that Nature Biotechnology would not publish the commentary, but not surprised. Frankly, it would have been too much to expect them and their reviewers to unravel the layers of deception in Feizi et al. in the initial review process; it took us an extraordinary amount of time and effort. Having published the paper, and without a clear path to unpublication (retraction is usually reserved for faking of experimental data), there were presumably not many options. The question is whether anything will change. Feizi et al. apparently have more Nature Biotechnology papers on the way as evidenced by an extraordinary declaration on Soheil Feizi’s publication page (this is a pdf of the version when this blog was posted; Feizi changed his site an hour later) where the publication venue of their next paper appears to have been preordained:

S. Feizi, G. Quon , M. Mendoza, M. Medard, M. Kellis
Comparative Analysis of Modular Integrative Regulatory Networks across Fly, Worm and Human
in preparation for Nature Biotechnology.

The term “in preparation” is common on academic websites, especially on sites of graduate students or postdocs who are eager to advertise forthcoming work, but “in preparation for” really requires some hubris. We have to wonder, why do Feizi et al. assume that peer review is just a formality in the journal Nature Biotechnology?

[Addendum 2/23: The authors posted a rebuttal to this blog post, which we replied to here.]