Dear LJO,

It is impossible to have a meaningful discussion about mathematics, when sentences are used that don’t mean anything. For example, you write that

“Thus, the criticism that the choice of beta is arbitrary is a subset of the criticism that there is no unique, optimal transformation that will as nearly as possible satisfy the model assumptions.”

What are you talking about? What “optimality” are you referring to here? what does “nearly as possible” mean? What model are you talking about? The model that includes scaling as a parameter, or the one that doesn’t?

The math in the Feizi et al. is not complicated. But the paper itself describes methods exactly as you have above, which is precisely part of the fraud, because it leads people to be confused about a very simple fact: the actual model of Feizi et al. has a parameter, beta, and there is no way to know how to set it. Yes, that is is a dark secret of the paper (by the way alpha is another).

I am not willing to give Feizi et al. the benefit of the doubt that they disclosed beta because the main text of the paper was deliberately written to mask it. Even the supplement section 1.3 now presents it in an oblique way, where they write “This assumption can be satisfied by lin- early scaling the observed dependency matrix. “Its not an assumption (!!) Its a parameter (!!) Second, the idea that the method is “robust to beta” is ludicrous. There are an infinite number of possible matrices that can come out of the method depending on beta, including the original input(!) Third, the results are not reproducible- not even the authors have claimed the $100 I’m offering. And, why haven’t they offered an explanation for why Figure S4 does not actually match the original one (even after transforming the x-axis which they claim was the issue).

I have one final thought (experiment) to leave you with. Suppose network deconvolution (whatever it actually is) works as you think it might. Its robust to some minor pre-processing (beta), and alpha also just doesn’t matter. Nothing matters. Whatever… network deconvolution is just a good thing to plug a matrix into it- you always get out a better one and biologists should start using it. Lets go with that story for a second. So I come with my matrix and I run the ND.m code and I got a better one. Now, I have a new matrix, and I’m thinking running it through ND.m again will make it even better right? I mean network deconvolution, in your belief system, doesn’t make things worse, and it doesn’t really matter where the matrix that is being inputted came from in the first place. Mutual information, correlation matrices, coauthorship networks, or the adjacency matrix of a graph. Its all just good stuff to plug into he thing. Great. Lets do it a third time to clean up even more. And a fourth time. Lets run network deconvolution over and over to clean the original matrix spick and span… not just a superficial dusting by running it once. After all, it may take more than one round to truly get rid of the pesky indirect effect. The authors themselves admit straight out that a single run of ND.m doesn’t clean out all the indirect effect. In our Gaussian graphical model example they admit openly that there are better optimal methods for specific domains and settings (e.g. partial correlation). But their argument is that deconvolution is just a good thing to do and doesn’t hurt.

Please don’t just think about my experiment, try it…

P.S. I truly appreciate both your effort to work through figuring out the paper, and also your public disclosure of who you are. When you apply to graduate school (if you are not completely jaded after being exposed to the people involved in this horrible affair), please do consider Berkeley. I’m serious.

]]>Thank you– I should say that I am an undergraduate working with Soheil and indirectly with Manolis (however, I have not spoken with them about this issue, and do not speak for them at all).

Okay, now let’s try this again. From the main text (and the Fig. 1 caption) of Feizi et al:

“Our formulation of network deconvolution has two underlying modeling assumptions: first that indirect flow weights can be approximated as the product of direct edge weights, and second, that observed edge weights are the sum of direct and indirect flows. When these assumptions hold, network deconvolution provides an exact closed-form solution for completely removing all indirect flow effects and inferring all direct interactions and weights exactly (Fig. 1d). We show that network deconvolution performs well even when these assumptions do not hold…”

My point was precisely that this this assumption is dependent on scale, as you understand: if H_dir=1/const*G_dir, then it is false that const*H_dir(1-H_dir)^-1=G_dir(1-G_dir)^-1. Conversely, if H_obs=1/const*G_dir(1-G_dir)^-1 , then it is false that H_dir=H_obs(1+H_obs)^-1.

Thus, the criticism that the choice of beta is arbitrary is a subset of the criticism that there is no unique, optimal transformation that will as nearly as possible satisfy the model assumptions. But the authors do not claim that there is such. They only claim that their method is exact when the assumptions hold (true, though the hypothetical is unlikely) and that it still works well when the assumptions do not hold (also true).

Now if ND were not robust to the choice of beta, and there were no good heuristic to choose it, then it might not be true that ND works well in practice when the flow assumptions fail (without being able to overfit the parameter, that is). But in fact ND turns out to be robust to the choice of beta (http://compbio.mit.edu/nd/ND_beta_effect.pdf). Unlike “number deconvolution,” in which different choices of the scaling parameter lead to totally different results, in ND, choices as disparate as .5 and .99 produce similarly good results. Basically, it appears that as long as the much weaker eigenvalue/convergence assumption is satisfied, the method performs well.

Is the parameter beta a deep dark secret that discredits the whole method, or a preprocessing detail used to help satisfy the well-documented model assumptions?

]]>PS. I forgot to respond previously to your statement that “[ND] is based on a theoretically sound idea”. You need to keep in mind that what Feizi et al. did here is propose a model for indirect effects and then give a method which kind of does something like recovering direct effects **within that model** (it doesn’t even really do that, of course).

So ND is only even plausibly “theoretically sound” when applied to data that actually arises from their model and yet the authors have not given even a single example where their model actually applies. As another commenter pointed out, it’s fairly obvious that, e.g., mutual information will not behave in the way described by this model.

A more accurate statement would be that ND is a heuristic method based on an intuitive, but theoretically unjustified, idea.

]]>Actually, your example is even more confusing than I originally noticed, e.g. you have H_obs and G_obs but only one matrix is being observed in this context so I’m not sure what exactly you’re doing here. If you’d like to write it out in more detail, then we could actually go over the math but one thing I can guarantee you is that if the matrix is scaled after performing the geometric sum* then you will **not** recover the original matrix up to a constant factor by the operation A(I + A)^-1. That operation is non-linear and so a linear scaling of A has a non-linear effect on the output.

About the discussion of units, I’m again somewhat confused. Of course there is no unit u for which u^2 = u. If there were, dimensional analysis would have some serious problems.

The introduction of scaling takes you from a scenario where every possible input matrix to the method has a unique output to one where every input matrix has an infinite number of possible outputs, depending on the value of a parameter. If you don’t think that’s the kind of thing that a reader might like to know about, I really don’t know what to tell you.

And while I let this slide before because your comment was mostly actual content rather than opinion, if you are going to be offering your personal opinions about the paper, you really really ought to disclose the fact that you have a direct relationship with one of the authors of the paper. To fail to do so would be…”somewhat opaque”, at best.

(* Please, let’s not use the term “transitive closure” since that term actually has an accepted meaning in other contexts.)

]]>What I mean by “H_obs on the scale of H_dir” is precisely that H_obs=1/const*G_obs, and yes, trivially, then const*H_dir=G_obs(1+G_obs)^-1. I should clarify scale to mean units: the linear flow assumption implies that there is a relation on the units, u^2=u. That way, matrix multiplication makes sense when you do TC. But if your scale is wrong, it is false that (u/const)^2=u/const. This is what I mean that the rescaling is natural, and why I disagree that it “seriously problematizes” anything beyond the original flow assumption. The paper is somewhat opaque, sure, but fraudulent? I do not buy that clarifying this issue would have led to the paper’s rejection by Nature Biotech.

]]>*First, it does not appear that the authors have attempted to cover up the importance of the scaling. The fifth sentence of the Methods section is “Note that, the observed dependency matrix is linearly scaled so that the largest absolute eigenvalue of G_dir < 1,"*

It’s not the importance of scaling (or rather its existence) that we suggested was covered up but rather its implications. Really, is this not a truly bizarre way to, effectively, introduce a parameter? It’s phrased as though this were some minor technical detail rather than something which seriously problematizes the nice clean picture they portray in their main figure on the method. When I first read that sentence, it took me a minute to realize that different scalings will give different results and I’m apparently considerably more familiar with this type of material than most readers.

A normal way to write that would have been “The observed dependency matrix is linearly scaled so that the largest absolute eigenvalue of G_dir is equal to β, a parameter of the method.” But at that point, the reader would probably like to know how this parameter is set, how it affects the results, and all that other murky stuff.

No, better not to mention it.

*Though the parameter beta is not mentioned directly, it is totally clear that preprocessing is required: “G_obs can be derived…” clearly implies that the input matrix is not just the original adjacency matrix.*

That sentence is referring to the original calculation of G_obs from data, e.g. by taking correlations between gene expression measurements. I’m not sure what you’re referring to by “the original adjacency matrix” but that’s as original as it gets here.

Regarding your “Second,”, I see you’ve phrased everything in terms of adjacency matrices and transitive closures. While the authors did say things like, “transitive closure of a weighted adjacency matrix”, as far as I can tell, there is no commonly accepted notion of such a thing. And, yes, it’s for the same reasons we’ve been discussing here: under the two definitions that Feizi et al. could be using here, either there are either some matrices which don’t have a transitive closure or every matrix has infinitely many. But the fact that their notion of “transitive closure” has exactly the same problems is another fault of the paper, not a defense. (We possibly should have brought this up in the original post but there were so much strange things going on, we couldn’t address it all.)

*Given a binary matrix of direct interactions H_dir, if we wish to perform transitive closure, H_dir must be scaled down so that second order effects have smaller size than first order effects (and even if H+H^2+… already converges the scale may be wrong). The natural way is to take G_dir=const*H_dir. Given H_obs on the scale of H_dir, it is appropriate to take G_obs=const*H_obs, and G_dir=G_obs(1+G_obs)^-1.*

While I’m not sure what you might mean by “(and even if H+H^2+… already converges the scale may be wrong)” or “Given H_obs on the scale of H_dir”, you’ve written both G_dir=const*H_dir and G_dir = G_obs(1+G_obs)^-1, which might lead someone to conclude that const*H_dir = G_obs(1+G_obs)^-1. So I’d just like to note that that is not correct.

You appear to have some notion of “transitive closure” which is even more particular than anything hinted at in Feizi et al. But whatever your notion is, surely you’ll agree that what you’re doing there is not, in fact, inverting it: you began with H_dir and your end result is not H_dir. This would be in contrast to Feizi et al. where we find statements such as “network deconvolution takes a global approach by directly inverting the transitive closure of the true network.”

That statement is absolutely false. And the authors must have known it was false.

Do you really consider this to be acceptable behaviour?

]]>Thanks for your comment, and for taking the time to write a thoughtful response to our allegations.

Unfortunately I don’t understand your claim that “the scaling parameter is intuitively necessary, and its presence does not diminish the justifiability of the method.” The point of the “number deconvolution” post was to make it clear, that once scaling is introduced into the model, there are an infinite number of possible solutions to the inverse problem. Without knowledge of how the observed matrices were scaled, how is one supposed to figure out how to scale? The authors themselves provide no theoretical or practical approach to scaling, other than to offer, on their FAQ: ” For the regulatory network, we used beta=0.5 as we expected indirect flows to be propagating more slowly” (it should be noted that even that was posted only recently as a response to our blog post). In the original paper seen by the reviewers and published on July 16th, aside from a single sentence in the main paper, there was only an explanation and demonstration in the supplement suggesting it should always be set (close) to 1, thereby implying it wasn’t really a parameter at all. This was a lie. Its really not accurate to say its “slightly more ad hoc than Fig. 1 makes it seem”. Figure 1 is not ad hoc at all, whereas the method is not ad hoc, but incoherent.

As you say yourself, I think there is a very big difference between publishing a paper with errors or oversights, as opposed to deliberately brushing a significant and problematic issue under the rug so that reviewers and readers don’t catch on to it. To make the point concrete, of how serious of a problem the scaling factor is, suppose you actually wanted to use ND tomorrow on a problem where you did not already know what the answer should be. How would you choose your scaling factor to improve your network? How would you know the choice you did pick improved it at all as opposed to degrading its quality. Would you really use ND having understood what I explain in my post?

]]>First, it does not appear that the authors have attempted to cover up the importance of the scaling. The fifth sentence of the Methods section is “Note that, the observed dependency matrix is linearly scaled so that the largest absolute eigenvalue of G_dir < 1," and most of the second paragraph deals with this scaling. Though the parameter beta is not mentioned directly, it is totally clear that preprocessing is required: "G_obs can be derived…" clearly implies that the input matrix is not just the original adjacency matrix. There is no deception here.

Second, whereas the "number deconvolution" analogy makes the scaling appear very illegitimate, it seems natural that such a step would be necessary for an adjacency matrix whose scale may be arbitrary in the first place. No sophisticated data analysis tool is a black box, and nothing in the paper suggests that ND should be used as such. Rescaling is the natural way to address the fact that the magnitude of a second-order effect as defined by transitive closure, compared with the magnitude of the first order effects, is scale dependent. This is most obvious going in the reverse direction: Given a binary matrix of direct interactions H_dir, if we wish to perform transitive closure, H_dir must be scaled down so that second order effects have smaller size than first order effects (and even if H+H^2+… already converges the scale may be wrong). The natural way is to take G_dir=const*H_dir. Given H_obs on the scale of H_dir, it is appropriate to take G_obs=const*H_obs, and G_dir=G_obs(1+G_obs)^-1. The scaling parameter is intuitively necessary, and its presence does not diminish the justifiability of the method.

Grasping the necessity of scaling and closely reading paragraphs 1-2 of Methods, there is no evidence of fraud. ND might be slightly more ad hoc than Fig. 1 makes it seem, but it is based on a theoretically sound idea and Fig. S4 shows that it is fairly robust in practice. The "gap between the main text and the method" is not "too great" but rather, much too small to support the heavy allegation of fraud.

]]>