Two weeks ago in my post Pachter’s P-value Prize I offered for justifying a reasonable null model and a p-value (p) associated to the statement “”Strikingly, 95% of cases of accelerated evolution involve only one member of a gene pair, providing strong support for a specific model of evolution, and allowing us to distinguish ancestral and derived functions” in the paper
M. Kellis, B.W. Birren and E.S. Lander, Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisae, Nature 2004 (hereafter referred to as the KBL paper).
Today I am happy to announce the winner of the prize. But first, I want to thank the many readers of the blog who offered comments (>135 in total) that are extraordinary in their breadth and depth, and that offer a glimpse of what scientific discourse can look like when not restricted to traditional publishing channels. You have provided a wonderful public example of what “peer review” should mean. Coincidentally, and to answer one of the questions posted, the blog surpassed one million views this past Saturday (the first post was on August 19th, 2013), a testament to the the fact that the collective peer reviewing taking place on these pages is not only of very high quality, but also having an impact.
I particularly want to thank the students who had the courage to engage in the conversation, and also faculty who published comments using their name. In that regard, I admire and commend Joshua Plotkin and Hunter Fraser for deciding to deanonymize themselves by agreeing to let me announce here that they were the authors of the critique sent to the authors in April 2004 initially posted as an anonymous comment on the blog.
The discussion on the blog was extensive, touching on many interesting issues and I only summarize a few of the threads of discussion here. I decided to touch on a number of key points made in order to provide context and justification for my post and selection of the prize winner.
The value of post-publication review
One of the comments made in response to my post that I’d like to respond to first was by an author of KBL who dismissed the entire premise of the my challenge writing “We can keep debating this after 11 years, but I’m sure we all have much more pressing things to do (grants? papers? family time? attacking 11-year-old papers by former classmates? guitar practice?)”
This comment exemplifies the proclivity of some authors to view publication as the encasement of work in a casket, buried deeply so as to never be opened again lest the skeletons inside it escape. But is it really beneficial to science that much of the published literature has become, as Ferguson and Heene noted, a vast graveyard of undead theories? Surely the varied and interesting comments posted in response to my challenge (totaling >25,000 words and 50 pages in Arial 11 font), demonstrate the value of communal discussion of science after publication.
For the record, this past month I did submit a paper and also a grant, and I did spend lots of time with my family. I didn’t practice the guitar but I did play the piano. Yet in terms of research, for me the highlight of the month was reading and understanding the issues raised in the comments to my blog post. Did I have many other things to do? Sure. But what is more pressing than understanding if the research one does is to be meaningful?
The null model
A few years ago I introduced a new two-semester freshman math course at UC Berkeley for intended biology majors called Math 10- Methods of Mathematics: Calculus, Statistics and Combinatorics“. One of the key ideas we focus on in the first semester is that of a p-value. The idea of measuring significance of a biological result via a statistical computation involving probabilities is somewhat unnatural, and feedback from the students confirms what one might expect: that the topic of p-values is among the hardest in the course. Math for biologists turns out to be much harder than calculus. I believe that at Berkeley we are progressive in emphasizing the importance of statistics for biology majors at the outset of their education (to be clear, this is a recent development). The prevailing state is that of statistical illiteracy, and the result is that p-values are frequently misunderstood, abused, and violated in just about every possible way imaginable (see, e.g., here, here and here).
P-values require a null hypothesis and a test statistic, and of course one of the most common misconceptions about them is that when they are large they confirm that the null hypothesis is correct. No! And worse, a small p-value cannot be used to accept an alternative to the null, only to (confidently) reject the null itself. And rejection of the null comes with numerous subtle issues and caveats (see arguments against the p-value in the papers mentioned above). So what is the point?
I think the KBL paper makes for an interesting case study of when p-values can be useful. For starters, the construction of a null model is already a useful exercise, because it is a thought experiment designed to test ones understanding of the problem at hand. The senior author of the KBL paper argues that “we were interested in seeing whether, for genes where duplication frees up at least one copy to evolve rapidly, the evidence better fits one model (“Ohno”: only one copy will evolve quickly) or an alternative model (both genes will evolve quickly).” While I accept this statement at face value, it is important to acknowledge that if there is any science to data science, it is the idea that when examining data one must think beyond the specific hypotheses being tested and consider alternative explanations. This is the essence of what my colleague Ian Holmes is saying in his comment. In data analysis, thinking outside of the box (by using statistics) is not optional. If one is lazy and resorts to intuition then, as Páll Melted points out, one is liable to end up with fantasy.
The first author of KBL suggests that the “paper was quite explicit about the null model being tested.” But I was unsure of whether to assume that the one-gene-only-speeds-up model is the null based on”we sought to distinguish between the Ohno one-gene-only speeds-up (OS) model and the alternative both-genes speed-up (BS) model” or was the null the BS model because “the Ohno model is 10^87 times more likely, leading to significant rejection of the BS null”? Or was the paper being explicit about not having a null model at all, because “Two alternatives have been proposed for post-duplication”, or was it the opposite, i.e. two null models: “the OS and BS models are each claiming to be right 95% of the time”? I hope I can be forgiven for failing, despite trying very hard, to identify a null model in either the KBL paper, or the comments of the authors to my blog.
There is however a reasonable null model, and it is the “independence model”, which to be clear, is the model where each gene after duplication “accelerates” independently with some small probability (80/914). The suggestions that “the independence model is not biologically rooted” or that it “would predict that only 75% of genes would be preserved in at least one copy, and that 26% would be preserved in both copies” are of course absurd, as explained by Erik van Nimwegen who explains why point clearly and carefully. The fact that many entries reached the same conclusion about the suitable null model in this case is reassuring. I think it qualifies as a “reasonable model” (thereby passing the threshold for my prize).
The p-value
One of my favorite missives about p-values is by Andrew Gelman, who in “P-values and statistical practice” discusses the subtleties inherent in the use and abuse of p-values. But as my blog post illustrates, subtlety is one thing, and ignorance is an entirely different matter. Consider for example, the entry by Manolis Kellis who submitted that thus claiming that I owe him 903,659,165 million billion trillion quadrillion quintillion sextillion dollars (even more than the debt of the United States of America). His entry will not win the prize, although the elementary statistics lesson that follows is arguably worth a few dollars (for him). First, while it is true that a p-value can be computed from the (log) likelihood ratio when the null hypothesis is a special case of the alternative hypothesis (using the chi^2 distribution), the ratio of two likelihoods is not a p-value! Probabilities of events are also not p-values! For example, the comment that “I calculated p-values for the exact count, but the integral/sum would have been slightly better” is a non-starter. Even though KBL was published in 2004, this is apparently the level of understanding of p-values of one of the authors, a senior computational biologist and professor of computer science at MIT in 2015. Wow.
So what is “the correct” p-value? It depends, of course, on the test statistic. Here is where I will confess that like many professors, I had an answer in mind before asking the question. I was thinking specifically of the setting that leads to 0.74 (or 0.72/0.73, depending on roundoff and approximation). Many entries came up with the same answer I had in mind and therefore when I saw them I was relieved: I owed $135, which is what I had budgeted for the exercise. I was wrong. The problem with the answer 0.74 is that it is the answer to the specific question: what is the probability of seeing 4 or less pairs accelerate out of 76 pairs in which at least one accelerated. A better test statistic was proposed by Pseudo in which he/she asked for the probability of seeing 5% or less of the pairs accelerate from among the pairs with at least one gene accelerating when examining data from the null model with 457 pairs. This is a subtle but important distinction, and provides a stronger result (albeit with a smaller p-value). The KBL result is not striking even forgoing the specific numbers of genes measured to have accelerated in at least one pair (of course just because p=0.64 also does not mean the independence model is correct). What it means is that the data as presented simply weren’t “striking”.
One caveat in the above analysis is that the arbitrary threshold used to declare “acceleration” is problematic. For example, one might imagine that other thresholds produce more convincing results, i.e. farther from the null, but of course even if that were true the use of an arbitrary cutoff was a poor approach to analysis of the data. Fortunately, regarding the specific question of its impact in terms of the analysis, we do not have to imagine. Thanks to the diligent work of Erik van Nimwegen, who went to the effort of downloading the data and reanalyzing it with different thresholds (from 0.4 to 1.6), we know that the null cannot be rejected even with other thresholds.
The award
There were many entries submitted and I read them all. My favorite was by Michael Eisen for his creative use of multiple testing correction, although I’m happier with the direction that yields $8.79. I will not be awarding him the prize though, because his submission fails the test of “reasonable”, although I will probably take him out to lunch sometime at Perdition Smokehouse.
I can’t review every single entry or this post, which is already too long, would become unbearable, but I did think long and hard about the entry of K. It doesn’t directly answer the question of why the 95% number is striking, nor do I completely agree with some of the assumptions (e.g. if neither gene in a pair accelerates then the parent gene was not accelerated pre-WGD). But I’ll give the entry an honorable mention.
The prize will be awarded to Pseudo for defining a reasonable null model and test statistic, and producing the smallest p-value within that framework. With a p-value of 0.64 I will be writing a check in the amount of $156.25. Congratulations Pseudo!!
The biology
One of the most interesting results of the blog post was, in my opinion, the extensive discussion about the truth. Leaving aside the flawed analysis of KBL, what is a reasonable model for evolution post-WGD? I am happy to see the finer technical details continue to be debated, and the intensity of the conversation is awesome! Pavel Pevzner’s cynical belief that “science fiction” is not a literary genre but rather a description of what is published in the journal Science may be realistic, but I hope the comments on my blog can change his mind about what the future can look like.
In lieu of trying to summarize the scientific conversation (frankly, I don’t think I could do justice to some of the intricate and excellent arguments posted by some of the population geneticists) I’ll just leave readers to enjoy the comment threads on their own. Comments are still being posted, and I expect the blog post to be a “living” post-publication review for some time. May the skeletons keep finding a way out!
The importance of wrong
Earlier in this post I admitted to being wrong. I have been wrong many times. Even though I’ve admitted some of my mistakes on this blog and elsewhere in talks, I would like to joke that I’m not going to make it easy for you to find other flaws in my work. That would be a terrible mistake. Saying “I was wrong” is important for science and essential for scientists. Without it people lose trust in both.
I have been particularly concerned with a lack of “I was wrong” in genomics. Unfortunately, there is a culture that has developed among “leaders” in the field where the three words admitting error or wrongdoing are taboo. The recent paper of Lin et al. critiqued by Gilad-Mizrahi is a good example. Leaving aside the question of whether the result in the paper is correct (there are strong indications that it isn’t), Mizrahi-Gilad began their critique on twitter by noting that the authors had completely failed to account for, or even discuss, batch effect. Nobody, and I mean nobody who works on RNA-Seq would imagine for even a femtosecond that this is ok. It was a major oversight and mistake. The authors, any of them really, could have just come out and said “I was wrong”. Instead, the last author on the paper, Mike Snyder, told reporters that “All of the sequencing runs were conducted by the same person using the same reagents, lowering the risk of unintentional bias”. Seriously?
Examples abound. The “ENCODE 80% kerfuffle” involved claims that “80% of the genome is functional”. Any self-respecting geneticist recognizes such headline grabbing as rubbish. Ewan Birney, a distinguished scientist who has had a major impact on genomics having being instrumental in the ENSEMBL project and many other high-profile bioinformatics programs defended the claim on BBC:
“EB: Ah, so, I don’t — It’s interesting to reflect back on this. For me, the big important thing of ENCODE is that we found that a lot of the genome had some kind of biochemical activity. And we do describe that as “biochemical function”, but that word “function” in the phrase “biochemical function”is the thing which gets confusing. If we use the phrase “biochemical activity”, that’s precisely what we did, we find that the different parts of the genome, [??] 80% have some specific biochemical event we can attach to it. I was often asked whether that 80% goes to 100%, and that’s what I believe it will do. So, in other words, that number is much more about the coverage of what we’ve assayed over the entire genome. In the paper, we say quite clearly that the majority of the genome is not under negative selection, and we say that most of the elements are not under pan-mammalian selection. So that’s negative selection we can detect between lots of different mammals. [??] really interesting question about what is precisely going on in the human population, but that’s — you know, I’m much closer to the instincts of this kind of 10% to 20% sort of range about what is under, sort of what evolution cares about under selection.”
This response, and others by members of the ENCODE consortium upset many people who may struggle to tell apart white and gold from blue and black, but certainly know that white is not black and black is not white. Likewise, I suspect the response of KBL to my post disappointed many as well. For Fisher’s sake, why not just acknowledge what is obvious and true?
The personal critique of professional conduct
A conversation topic that emerged as a result of the blog (mostly on other forums) is the role of style in online discussion of science. Specifically, the question of whether personal attacks are legitimate has come up previously on my blog pages and also in conversations I’ve had with people. Here is my opinion on the matter:
Science is practiced by human beings. Just like with any other human activity, some of the humans who practice it are ethical while others are not. Some are kind and generous while others are… not. Occasionally scientists are criminal. Frequently they are honorable. Of particular importance is the fact that most scientists’ behavior is not at any of these extremes, but rather a convex combination of the mentioned attributes and many others.
In science it is people who benefit, or are hurt, by the behavior of scientists. Preprints on the bioRxiv do not collect salaries, the people who write them do. Papers published in journals do not get awarded or rejected tenure, people do. Grants do not get jobs, people do. The behavior of people in science affects… people.
Some argue for a de facto ban on discussing the personal behavior of scientists. I agree that the personal life of scientists is off limits. But their professional life shouldn’t be. When Bernie Madoff fabricated gains of $65 billion it was certainly legitimate to criticize him personally. Imagine if that was taboo, and instead only the technical aspects of his Ponzi scheme were acceptable material for public debate. That would be a terrible idea for the finance industry, and so it should be for science. Science is not special among the professions, and frankly, the people who practice it hold no primacy over others.
I therefore believe it is not only acceptable but imperative to critique the professional behavior of persons who are scientists. I also think that doing so will help eliminate the problematic devil–saint dichotomy that persists with the current system. Having developed a culture in which personal criticism is outlawed in scientific conversations while only science is fair fodder for public discourse, we now have a situation where scientists are all presumed to be living Gods, or else serious criminals to be outlawed and banished from the scientific community. Acknowledging that there ought to be a grey zone, and developing a healthy culture where critique of all aspects of science and scientists is possible and encouraged would relieve a lot of pressure within the current system. It would also be more fair and just.
A final wish
I wish the authors of the KBL paper would publish the reviews of their paper on this blog.
27 comments
Comments feed for this article
June 9, 2015 at 2:03 am
GM
This has indeed been a tremendously enlightening exercise. However, I am pessimistic about the prospects of the situation improving – the behavior you’re criticizing is to some extent purely sociological in its origin, but there is also the hard reality of increasing competition for limited resources and a general public that is completely incapable of understanding the fine points of real scientific debate (for which state of affair scientists share a lot of the blame, by so often willingly going with the popular portrayal of themselves as infallible geniuses). If ENCODE had launched an equally aggressive media campaign to correct the wrong impression it gave to the public about the extent of functionality of the genome with its original announcements, we all know what the reaction would have been.
We should strive to fix things, but as long as those larger factors persist, not much will change…
June 9, 2015 at 2:18 pm
dhaus
While the ENCODE consortium may have made wrong statements about junk DNA, the project did generate a lot of data which has been used extensively in the bioinformatics community so I do think the project had value (whether they could have done more informative experiments is another debate).
In fact, my guess is that the whole 80% comment was more a way to sell the results of the project to the public. If so, it definitely backfired.
June 12, 2015 at 8:39 am
Manolis Kellis
Dear Lior, you are confused about multiple aspects:
—-
First, you are confused about the seven sentences of our paper. Let’s review them briefly.
1. We wrote: “Two alternatives have been proposed for post-duplication *divergence* of duplicated gene pairs. Ohno has hypothesized that after duplication, one copy would preserve the original function and the other copy would be free to diverge (Ohno 1970). However, this model has been contested by others, who argued that both copies would diverge more rapidly and acquire new functions (Force, Lynch, et al).”
Summary: When gene functions diverge, Ohno 1970 predicts asymmetric divergence, but Force (challenging Ohno) predicts symmetric divergence.
2. We then wrote: “We found that 76 of the 457 gene pairs (17%) show accelerated protein evolution relative to K. waltii, defined as instances in which the amino acid substitution rate along one or both of the S. cerevisiae branches was at least 50% faster than the rate along the K. waltii branch. We note that this calculation is conservative and may miss some cases of accelerated evolution (see Methods).”
Summary: As a proxy for accelerated divergence, we use 1.5-fold rate increase, understanding this is conservative.
3. We then wrote: “Strikingly, in nearly every case (95%), accelerated evolution was confined to only one of the two paralogues. This strongly supports *the* model in which one of the paralogues retained an ancestral function while the other, relieved of this selective constraint, was free to evolve more rapidly (Ohno 1970).”
Summary: Between the two alternatives described, the data strongly favor Ohno 1970 (which lives another day) over Force 1999 (which is rejected).
It’s that simple. Your disagreement indicates that either: (i) you thought we meant something else, or (ii) you wish we had asked a different question.
—-
Second, you are confused about Ohno 1970. You seem to think that Ohno suggested acceleration or loss as the only possible fates of duplicated genes. This is incorrect. You should read Part 3 of Ohno’s book, where each fate is discussed in consecutive chapters X-XIII: gene dosage and gene conversion, differential regulation, heterozygous allele fixation, and potentially new functions by accelerated evolution. Many pairs are maintained without acceleration, and thus to test symmetric vs. asymmetric divergence, we focus on accelerated pairs only.
—-
Third, you are confused about my calculations. Our paper compared two models (Ohno vs Force), so a likelihood ratio is the appropriate computation. Instead, you asked for a P-value for a null model. So I obliged and provided the probability of our exact counts under the alternate model (Force), but I also provided the more appropriate likelihood ratio between the compared models (Ohno vs Force) for two levels of confidence. If one uses confidence 1 (instead of .55-.95) and a one-tailed test (instead of the exact counts), the difference in likelihood becomes even stronger (and your debt infinite).
—-
Fourth, you are confused about the independence model you advocate. Both Ohno and Force agree that relaxation of selection in one copy is *conditional* on another gene copy maintaining the old function (or subfunction), and both make biological sense. Independence advocates that relaxation happens at random, *independently* of the other copy, which makes little biological sense. The gene loss patterns indicate that this is simply not true, and rejects independence. You can continue arguing for independent relaxation of selection in duplicated pairs, but (i) you’ll be wrong, and (ii) it was simply not part of our paper.
—-
Fifth, you insist on testing independence only on 457 gene pairs. Both Ohno and Force advocate pressure to maintain ancestral functions/subfunctions for *all* pairs (not just the 457 pairs). Independence would instead argue no such pressure for *all* pairs (not just the 457 pairs). To clarify, *all* yeast genes were duplicated (not just 457). The 457 pairs are simply the ones that were *kept* in two copies.
—-
Lastly, if you want to read about symmetric vs. asymmetric divergence beyond the seven sentences of our paper, a great starting point is Byrne and Wolfe 2007 http://www.ncbi.nlm.nih.gov/pubmed/17194778 who studied relaxed selection for duplicate pairs using both acceleration and loss, and who arrive at a similar conclusion (strongly in favor of Ohno vs. Force).
—-
I admire your enthusiasm and the energy you employ to study our papers. I also admire your courage in offering such strong opinions on areas outside your research focus. But please let’s focus on the science. I hope my posts help clarify the scientific issues.
June 12, 2015 at 11:56 am
Tyler DeGruttola
Dr Kellis can you please clarify whether your underlying null model is H0: Ohno = Force? If not, is it possible to write it down in this way? Without clarifying your null model in an explicit form widely accepted in statistics field, there will always be misunderstanding among each other because it is no trivial.
June 12, 2015 at 2:43 pm
Tyler Degruttola
With all due respect, no matter which null model we are using, the analysis focusing on 76 gene pairs alone will send the nurse in Netherlands to jail: http://www.math.leidenuniv.nl/~gill/
June 9, 2015 at 6:03 am
Claudiu Bandea
As posted above by GM, “this has indeed been a tremendously enlightening exercise”. However, unlike GM, I am optimistic about the prospects of the situation improving.
Indeed, if the ‘behavior’ that Lior criticizes continues to be criticized, then, few if any scientists will be willing to engage in it. Indeed, very few people, if any, will be ‘robbing banks’ knowing that their actions are fully monitored. I made this comment in one of Lior’s previous post, but I think it might be worth to re-post it in the context of this discussion ( https://liorpachter.wordpress.com/2014/04/30/estimating-number-of-transcripts-from-rna-seq-measurements-and-why-i-believe-in-paywall/):
“The traditional, closed peer-review system and the conventional ‘civility’ associated with the science enterprise, have allowed, if not encouraged, people to prosper by misrepresenting facts and overhyping their work (at the expense of science and their colleagues), without the *fear* of open and explicit exposure.
And, unfortunately, in order to be able to compete in such a corrupt environment, many of their peers had little choice but to lower their ethical and scientific standards and join this unproductive and reckless competition; and, by doing so, all of them (even the most reckless ones, who often display embarrassing CVs inflated with pompous titles and rewards) have become victims of the system.
So, the problem is primarily with the system, not with the people; we know that people have the potential of doing wonderful or despicable things, depending of the system in which they operate: just look at history.
Fortunately, the solution to this growing problem in science is relatively simple: establish a comprehensive and open peer-review system at all levels of scientific enterprise (including science funding and communication) that would allow and encourage full peer participation. As I previously mentioned, very few people, if any, will be robbing banks knowing that their actions are fully monitored. Also, very few people, who do not deserve it, will be rewarded, if the decisions about the rewards are made by the entire community of peers, not by a restricted group of people linked in a network of corrupted influence that is protected by the conventional ‘civility’ they promote.
Fortunately, the scientific and scholarly publishing is highly compatible with an open evaluation approach, as the whole reason for writing manuscripts is to have them read and evaluated by peers. To be specific, in the Pachter/Kellis case, my suggestion would be that they fully evaluate each other’s publications, as well those of their colleagues. And, although, this or other blogs are great venues for extended evaluations, I would suggest presenting them, at least in summary, on other platforms, such as PubMed Commons, which would increase their corrective and preventive power.”
I would also highly recommend Michael Eisen’s recent post on this subject at his blog “it is NOT junk”:
http://www.michaeleisen.org/blog/?p=1718
June 9, 2015 at 9:26 am
Claudiu Bandea
Correction: the correct web-link for the re-posted comment above is: https://liorpachter.wordpress.com/2014/02/12/why-i-read-the-network-nonsense-papers/
The original link was to another post by Lior, in which he addressed the ENCODE fiasco. Not surprisingly though, ENCODE has made its way into Lior’s latest post in the section “The importance of wrong”.
Well, I think Lior is ‘wrong’ about the ENCODE story, as ENCODE *did not* the claim in their published scientific papers that “80% of the human genome is functional”; that claim was fed to media by a few ENCODE reckless leaders. On the contrary, in what seems to have been a concerted, but tacit ‘silence policy,’ the ENCODE authors (obviously under the direction/control of the leaders mentioned above) went out of their way not to address the ‘functionally’ of the human genome in their publications. Consequently, the authors who criticized the presumed ENCODE’s conclusion that “80% of the human genome is functional” had little choice but to set up the premise for their articles using secondary, ‘newspaper’ references (see PubMed Commons comment to Ford Doolittle article “Is junk DNA bunk? A critique of ENCODE” at: http://1.usa.gov/1oYpytp).
So, why didn’t ENCODE address the ‘functionally’ of the human genome in their scientific publications? This makes no sense considering that their massive and expensive project was funded specifically to annotate the ‘functional sequences’ of the human genome? And, why wasn’t the project designed in context of the fundamental issues and knowledge about genome biology, functionality and evolution, such as the C-value paradox, limited sequence conservation among closely related species, mutational load, and the evolutionary origin of most genomic sequences from transposable elements?
Well, for ‘good’ reasons: this knowledge indicates that only a relatively small fraction of the human genome can have informational functions. So, it was a high appreciation of this fundamental knowledge that prompted the ‘silence’ orchestrated by few of the ENCODE leaders, because this knowledge was in conflict with some of the ENCODE study objectives, and it would have raised inconvenient questions about the relevance of the entire study and results, as well as funding.
Interestingly, in a recent article published in PLOS Genetics (http://www.ncbi.nlm.nih.gov/pubmed/25950175), Doolittle has echoed the same points:
“…there are two devastating things you can say about the ENCODE people. One is that they completely ignored all that history about junk DNA and selfish DNA. There was a huge body of evidence that excess DNA might serve some structural role in the chromosomes, but not informational. They also ignored what philosophers of biology have spent a lot of time asking: what do you mean by “function?”
“ENCODE wouldn’t have got funded had they said 80% of the human genome is just junk, transposable elements.”
However, Doolittle is wrong in stating that ,“…there are two devastating things you can say about the ENCODE people”; these “devastating things” are not about the ENCODE people, but about a few leaders of the project, and this should be acknowledged in any critique of ENCODE.
June 9, 2015 at 6:30 am
edermitzakis
Lior, thanks for the insightful post. I agree (!) with most of what you say regarding the value of admitting that one is wrong and overall the healthy discussion. I also agree with the fact that people are behind science and so it naturally comes to people to be under scrutiny. Let me, however, suggest, one qualifier: while one can criticize analysis or experimental choices, behaviors etc at individual events or cases, it does not mean that these are generalizable to other activities of the individual. In other words, it is one thing to criticize a paper of one scientist and another to discount all the scientific work of one person based on bad choices in one or two papers. This last part is what I find inappropriately personal and can happen accidentally or intentionally sometimes.
One relevant issue is that many people are asking for double blind reviews (i.e. both authors and reviewers are anonymous). This goes against open criticism and the notion that individuals are behind science. I personally find this wrong and I strongly support a transparent review process. I wonder where you stand on this.
Finally, I fully agree with you about a culture where criticism is acceptable. Civilized communication does not mean patting each others back but instead criticizing with an effort to improve someone’s work and facilitate discovery. And at the end of the day there are many issues where we simply don’t know what is wrong and what is right – difficult to prove things in biology 😉 – so it should be acceptable to “agree to disagree” as long as an issue is still pending. And in some cases we don’t even all agree that the issue is pending.
So let’s keep the discussion going in a civilized but lively way, and let’s make more biological discoveries. While people do the science what stays is the science not the people.
June 9, 2015 at 7:08 am
Patrick Phillips
I continue to be amazed at the energy everyone is displaying in this and related discussions. It definitely makes me feel even more lazy than usual. Thanks to everyone involved.
One thing that I don’t think has been discussed quite enough is the need to lower the cost of being wrong. Right now, admission of being wrong is usually equated to being a bad scientist, as opposed to a natural part of the scientific process. Feeling attacked or “shamed” is not going to help this, although the truth does need to will out (every crisis management PR person will tell you that moving quickly to own the truth is the best way forward in tricky situations). The move in PPPR to view everything as an ongoing conversation is certainly a trend in the right direction, although ensuring a general tendency toward high quality discourse will remain an issue.
An interesting historical example is W. Ernest Castle, the Harvard geneticist who was so frequently wrong about early early genetics that it became something of a joke. Yet he always reversed himself when appropriate, and his lab went on to found virtually all of modern mammalian genetics.
June 9, 2015 at 11:03 am
ianholmes
I am curious how many computational biology papers have ever been retracted. I found one, the following PLoS CompBio paper which claimed that Bayesian phylogenetics did not work, a result that turned out to be due to a bug in a Perl script:
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030158
I have not seen any papers that have been retracted due to fraud or deception, but I would be surprised if that sort of thing hasn’t happened in compbio as much as in other fields.
In general, I don’t think you can really trust any computational papers where the source code & data are not available free of charge. I actually find this far more problematic than closed access: it’s one thing to charge someone for a paper, it’s a far worse thing — IMO — to charge them EXTRA if they want to verify it. At least the cost of the paper is up-front in a closed-access journal. The hidden cost of putting up a paywall to verify a result is much greater.
For myself, I rather wish the authors of KBL would publish the software they used for comparative analysis of yeast genomes, not just the reviews of their written summary of that software.
June 9, 2015 at 4:21 pm
nsaunders
Difficult to estimate retractions. You can search PubMed by journal abbreviation e.g. “PLoS Comput. Biol.” and “Retracted Publication[PTYP]” (that gives 2 results currently); 1 result for “BMC Bioinformatics” and a smattering of other journals with “Comput” in the title. But computational biology/bioinformatics articles may appear in other non-specialist journals. May be possible to search on MeSH terms I guess.
One of the more infamous examples of retractions due to computational error happened in structural biology, nicely summarized here: http://boscoh.com/protein/a-sign-a-flipped-structure-and-a-scientific-flameout-of-epic-proportions.html
June 19, 2015 at 10:50 am
ianholmes
I would like to clarify my above comment about compbio retractions to make it clear that this comment was not directed at Kellis et al, but was a general remark in response to Lior’s post about computational biologists admitting when they are wrong. My observation is that compbio as a field seems to have relatively few retractions, and my curiosity is whether this impression is supported by data. I did not mean to imply that the KBL paper should be retracted.
I followed this comment with a remark about releasing code. Again this is a belief that applies to all compbio work and not just KBL. In general I think compbio papers should post their code, for reasons of reproducibility. Verbal descriptions of code are almost always incomplete, and without a way to run the code itself (and ideally scrutinize it), I consider that a methods section is incomplete. I favor the Titus Brown approach of releasing the entire workflow.
The KBL yeast paper in particular is one where I would like to see the code released, because that work described major advances in genefinding sensitivity & specificity by using the indel patterns as a signature (indels within ORFs are a multiple of 3 bases long). I think it is important to verify and reproduce this work, it is an area I care about (having been modeling indels and alignments for some time) and so I would like to see the KBL code.
Speaking more generally to Lior Pachter’s criticisms of Manolis Kellis, in his “Network Nonsense” post Lior says “In academia the word fraudulent is usually reserved for outright forgery” and goes on to argue for a broader use of the word “given what appears to be deliberate hiding, twisting and torturing of the facts”. I believe this is unhelpful. The word “fraud” is reserved to mean forgery because forgery, properly, has very severe consequences. In other places Pachter has indicated that he wishes those consequences upon Kellis (saying that Kellis should lose his job, for example). But “hiding, twisting, and torturing the facts” is a subjective description of events, as (in fact) is “deception deliberately practiced in order to secure unfair or unlawful gain” (a quote Lior does not source, but which Google attributes to the American Heritage Dictionary). The fact is that compbio is a hype-rich field, and few (including myself) would escape the charge of using at least some exaggeration or hype to describe their work, a practice which I agree is deplorable and could easily be characterized by someone more rigorous as “deception deliberately practiced in order to secure unfair gain”. Because, after all, we all deceive ourselves first, to some extent; don’t we?
The most serious accusation Lior makes is that Kellis et al replaced the text of a figure to subtly but significantly alter its meaning. I personally think that publicly inviting them to publicize this change as an erratum would be a more helpful approach than accusing them of fraud. Clearly Lior believed differently: he says that he thought long and hard before making this accusation, so one must assume he considered less draconian options than calling for Manolis to be fired as a fraud.
I completely support Lior’s right to criticize specifics of Manolis’ work, using whatever theatrical stunts he chooses to draw attention to these criticisms, including prizes and hyperbole. I would indeed praise his work as a post-publication peer reviewer: I believe the criticisms are, to a greater or lesser extent, valid. Network deconvolution probably has more hidden data-dependent parameters than Feizi et al admitted at first (so do a lot of methods: compbio code is ridden with hidden parameters). The choice of models for homologous protein rates in the yeast genomes paper could have been broader. But these are not out of line with the sorts of distortions or vagaries that (unfortunately) occur very often in compbio.
I would dearly like to see compbio become more self-critical. I think Lior’s blog is an important step in this direction, and a valuable experiment in post-publication peer review. I think that one take-home message of this experiment is that one should be careful of accusations like “fraud”, which (as Lior acknowledges) have more than one meaning: a dictionary meaning in common usage, and a far more precise meaning that is specific to scientific ethics. Conflating the two risks devaluing the latter, and muddling up valuable scientific discussion with ad-hominem criticism. Let us reserve the word “fraud” for outright forgery. There are other terms (bluster, hype, subjectivity, bias, lack of rigor) that better characterize what Lior is getting at.
Lastly, I am trying to tread a fine line here. Unlike others I am not attacking Lior. He has broken new ground with this blog. Rhetoric and showmanship are an important part of what he has doing. Knowing him, I expect he will not back down from his accusations of fraud nor his demands for Manolis’ job. That’s up to him. I’m simply saying where I stand. Deliberate, outright, result-faking fraud is a very serious issue that rightly needs to be a line in the sand for all sciences. Hype, self-serving bias, and irreproducibility are major problems that confound bioinformatics; they need to be solved, but not by conflating them with fraud. Prizes, critiques, fierce hyperbole: all are fair game. I find Lior’s style entertaining, and his work is excellent, but I also need to mention that I am a great admirer of work that’s come from Manolis’ lab too. The phylogenomics methods with Matt Rasmussen spring to mind. Manolis has also participated in some amazing biological discoveries, such as those involving the Piwi-interacting RNAs. There are many, many more. So I would be very disappointed if his career were significantly negatively affected as the result of one figure change which could easily be published as an erratum, or because he defended a choice of null model.
June 19, 2015 at 8:43 pm
Lior Pachter
Dear Ian,
I appreciate your thoughtful comments and careful consideration of some of the posts on the blog.
For the record, I want to clarify, that the accusation of fraud in the was based specifically on the fact that in Feizi et al:
1. the method used to obtain the results in the paper is completely different than the idealized version sold in the main text of the paper and
2. the method actually used has parameters that need to be set, yet no approach to setting them is provided. Even worse,
3. the authors appear to have deliberately tried to hide the existence of the parameters. It looks like
4. the reason for covering up the existence of parameters is that the parameters were tuned to obtain the results. Moreover,
5. the results are not reproducible. The provided data and software is not enough to replicate even a single figure in the paper. This is disturbing because
6. the performance of the method on the simplest of all examples, a correlation matrix arising from a Gaussian graphical model, is poor.
To characterize this as an issue where “Network deconvolution probably has more hidden data-dependent parameters than Feizi et al admitted at first (so do a lot of methods: compbio code is ridden with hidden parameters).” is not accurate. This is not a case where, as you say “compbio is a hype-rich field, and few (including myself) would escape the charge of using at least some exaggeration or hype to describe their work”. I personally fail to see how issues 1–6 above constitute “some exaggeration” or “hype to describe their work”.
Furthermore, you say that you “personally think that publicly inviting them to publicize this change as an erratum would be a more helpful approach than accusing them of fraud. ” In that regard, I’d like to point out that (a) the change of figure S4 by the authors was done as a direct result of our communication with them but not disclosed honestly and (b) in addition to extensive communication with the authors we also submitted a comment to NBT that was rejected after (we believe) negative feedback from the authors and (c) as we stated in the blog post, we “really really wish we could share the reviews”. Although we can’t, because they are confidential, I’d just like to remind you that the authors could do so at any time lending full transparency to the matters at hand and establishing (if that is what you believe) that our blog post was unwarranted given as you, say, that a simple “erratum” could have been published. Also, and this is a minor point, (d) although the blog is now widely read constituting what you might deem a “public forum”, at the time (September 2013) the blog was only a few weeks old and had garnered only a handful of views. What public forum, other than the one we pursued (a letter to the journal) do you suggest we should have tried?
Finally, on the matter of examples of good work coming from Manolis’ lab, I’d like to make clear that I don’t think that the commitment of fraud in science is ok if the person who did it also has their name on other work that is of some value.
Thanks again,
Lior
June 9, 2015 at 4:22 pm
homolog.us
Peer reviews always used to be ‘post-publication’.
“The Original Purpose of Peer Review”
http://www.homolog.us/blogs/blog/2015/06/09/the-original-purpose-of-peer-review/
June 9, 2015 at 5:38 pm
Brian Raney
I’m very optimistic about the future of scientific enquiry, but this is mostly based on my understanding that science is becoming increasingly irrelevant. We’re never going to have something else that scares normal people into caring what we say than the atomic bomb, or cell phones have. In the future, science will be argued out by the priesthood largely outside the public’s view, without this need to appear relevant that is, unfortunately, the driver of successful careers in modern genomics and the science in general.
June 10, 2015 at 6:05 am
Mr. Nobody
Thanks for these posts. I have truly enjoyed them, and these helps me to recover my faith on science. I want to thank also all the commenters, it has been really instructive.
I am a biologist, and have no problem to admit (not proudly) my limitations regarding statistical illiteracy, which is quite large in the genomics field. I am trying to solve this issue, especially nowadays when data are so big.
I find you are totally right, in all your statements, but especially in the null model. We DO NOT reflect enough in this matter when we are inferring evolutionary properties. The truth is that we are clueless regarding how evolution works most of the time. If you don’t believe me ask these guys working on alignment uncertainty (Grauer, Notredame, just to mention a couple). In molecular evolution, we make lots of assumptions, which are generally accepted by the community, and not really accurate -to say the least. To mention a simple one, we assume that we know how to distinguish orthologues from paralogues in cases where genes are lost. Well, the truth is that we simply cannot. Moreover, we are neglecting events of domain shuffling, accretion, permutations, etc, when doing such assumptions.
We frequently neglect events such as genome contractions (reported in Saccharomyces), that likely could be influential in the context of further WGD. So, my point here is that doing bioinformatics without knowledge or input from strong population genetics, is really a bad exercise, and sadly we are witnessing a large expansion of these bad exercises in Nature/Science/Cell papers. And the truth, its that all these genomic papers making evolutionary claims lack basic knowledge in evolutionary Biology. My point here, is that before attempting to make any sense out of p-values, we really need to understand the underlying biology and therefore to establish a proper null model for all the uncertainties we are assuming as true or known events.
This is problematic since this kind of exercise takes lots of time and rethinking, with little chance to make it to NCS. Sadly enough we are forced to do science to optimize metrics not knowledge.
June 10, 2015 at 8:21 am
Anon
Out of curiosity, are there any works or papers of yours that you feel, in retrospect, may be wrong or have any errors?
June 10, 2015 at 2:38 pm
Lior Pachter
Yes. For example, in Trapnell et al. 2010 I advocated for the use of FPKM as a unit for RNA-Seq, which I latter came to believe was not optimal. Instead, it is better to use TPM (transcripts per million), and I made this point and corrected my error in a presentation three years later.
June 10, 2015 at 3:20 pm
Anon
Awesome, thanks! I’ve really appreciated this entire thread and hope that some of the taboo around admitting one’s mistakes goes away!
June 10, 2015 at 9:01 am
Stephen Floor
I think part of the challenge in biological research is to define “truth”. In mathematics and computer science there is usually objective truth; in physics, theories are derived from first principles and then tested against experiment. In biology, truth is much more operational. For example, if a scientific breakthrough is made that uncovers some unappreciated biological process, does it render all prior publications that interpreted their results without the knowledge of this process wrong?
Because of this, it is difficult to say which “time capsule” publications from 11 years ago should be opened and re-examined. If the null model is based on contemporary understanding and that leads to anomalously large p-values, and later breakthroughs invalidate the null, does that make the research wrong? I think this is part of the reason people are opposed to examining old papers in biology in particular. As biology becomes more quantitative, this philosophy should be abandoned in favor of objective truth, when possible.
June 11, 2015 at 10:36 am
Claudiu Bandea
Stephen,
You made a good point about “time capsule” publications; however, if these publications continue to be regarded as highly relevant and are still being cited, then, they should *not* be spared from critical evaluation.
You also mentioned the challenge of defining the “truth” in biology. Let’s see what Susumu Ohno, whose ideas were evaluated in the KBL paper, had to say about “truth”:
“Over the years, I have learned that there is no such thing as a fact. What passes for a fact is in truth a set of observations and its interpretation. Therefore, the interpretation is just as important to a fact as the observation itself” (Ohno S. 1973. Evolutional reason for having so much junk DNA. In Modern Aspects of Cytogenetics: Constitutive Heterochromatin in Man (ed. R.A. Pfeiffer), pp. 169-173. F.K. Schattauer Verlag, Stuttgart, Germany).
I think the vast majority of scientists would agree with Ohno that the interpretation of data and observations is as important, if not more, as the process of generating them. However, unfortunately, that is not reflected in the current system, at least not in the biomedical fields and biology in general, in which generating data and observations and packaging them in publications is key to academic and career success.
The result is tons of data and observations that often are unreliable, artificial or misinterpreted, and there is relatively little investment in an independent and systematic system for evaluating them, which can be founded at a fraction (e.g.1%) of the cost. I’ll follow this comment, shortly, with a proposal for such an evaluation system.
June 11, 2015 at 3:46 pm
Claudiu Bandea
Can somebody else (NIH) please show that ‘I was wrong’
While the KBL authors are fighting to uphold their Nature trophy to ‘no wrong’, many of us would love to have our papers evaluated, even if graded ‘wrong’; at least, that would be an acknowledgement that our papers exist.
Take, for example, a paper on the evolution of genome size and the putative biological function of the so-called “junk DNA” ( http://www.ncbi.nlm.nih.gov/pubmed/2156137), which I published 25 years ago.
I don’t know if any of my peers read the paper or not, but its existence was not acknowledged by a single citation, I would guess because it was wrong. Nevertheless, I recently took advantage of the ‘ENCODE kerfuffle’ (to used Lior’s term) and wrote a second paper on the same idea ( http://biorxiv.org/content/early/2013/11/18/000588). Well, this second time around it was confirmed: apparently, my hypothesis came “pre-refuted.” However, I was thrilled that it was pre-nominated as “a beautiful hypothesis” (http://judgestarling.tumblr.com/post/67599627086/a-pre-refuted-hypothesis-on-the-subject-of-junk). Thank you Judge Starling.
However, as much as I appreciate Judge Starling, and for that matter our host Lior Pachter, I think, we need a more systematic approach to review and evaluate science.
Indeed, it makes little sense to invest thousands of millions of dollars (courtesy of tax payers and donors) in funding scientists to produce data and papers, without investing a fraction of that (let’s say, 1%) in an independent review platform/system (RS) of its products. Expectedly, a RS would be:
(i) timely (that means, days),
(ii) open (available on-line),
(iii) comprehensive (have all publications/results critically reviewed by multiple reviewers).
There is no shortage of highly competent and enthusiastic scientists who would love “making a living” (i.e. obtain grants) as RS-reviewers, either on a full-time base, e.g. ‘independent contractors’, or part-time in conjunction with their academic positions. Moreover, although RS-reviewers would represent the core of RS, all scientists would be encouraged to contribute RS entries, which would be a great opportunity to build an ‘RS portfolio’ and apply for RS grants.
Any ideas on how to fund/implement this type of project? NIH?
June 10, 2015 at 9:10 pm
astoltzfus
I don’t think the issue has been fully resolved. Kellis, et al presented a simple binomial test of A vs. B, showing that A is greater than B by a huge margin, where A = one accelerated vs. B = both accelerated.
This is a legitimate test. It is just not very meaningful.
Various people have explained that the test presented by Kellis, et al does not make much sense if the null hypothesis is independent evolution of every gene. Under the independent evolution model, the results are not at all “striking”.
Kellis and Lander object that the idea of independent evolution is absurd. The rate of evolution of duplicates is not independent.
So, what is the correct test under non-independence? Perhaps this has been explained previously by others. I did not go through all the comments on this point.
However, I would suggest that the test presented by Kellis, et al is also not reasonable under the hypothesis of non-independence.
It is easy to show why this is meaningless with some simulation code.
# 1000 values for evolutionary rate of genes with mean 100 and sd = 10
a <- rnorm(1000, 100, 10)
# add paired values that differ by mean = 0, sd = 10
dupls <- data.frame(a = a, b = a + rnorm(1000, 0, 10))
# see the relation with plot(dupls)
# pick the top 10 % as the "accelerated" class
threshold <- quantile(c(dupls$a, dupls$b), 0.9)
accel = threshold | dupls$b >= threshold, ]
# now compare
dim(accel)[1] # all pairs that have at least 1 duplicate accelerated
sum(accel$b >= threshold && accel$b >= threshold) # pairs with both accelerated
What this does is to simulate the rate of pairs of duplicated genes, where the rate is correlated. We have a bunch of pairs (A1, B1), (A2, B2) and so on. We prune this to include only pairs in which at least one member is accelerated above a particular threshold. Because we are asking for extreme random deviates, the original correlation is degraded, necessarily. For the particular values I have chosen, which were just the first values that came to mind, it turns out that the number of cases in which 1 member of a pair is duplicated far exceeds the number in which both pairs are accelerated. So, the test proposed by Kellis, et al is relatively meaningless even when evolution of duplicate pairs is correlated.
June 11, 2015 at 5:05 pm
Marnie Dunsmore
Cool. Thanks.
Can you talk about this statement: “Because we are asking for extreme random deviates, the original correlation is degraded, necessarily. ”
Why does this happen? Why the original correlation become degraded?
June 10, 2015 at 9:15 pm
astoltzfus
sorry, one of the code lines did not get parsed correctly. I’ll try to put it in quotations and see if that works
“accel = threshold | dupls$b >= threshold, ]”
June 10, 2015 at 9:17 pm
astoltzfus
No, that still did not get parsed correctly.
accel dupls[dupls$a >= threshold dupls$b >= threshold, ]
June 10, 2015 at 9:19 pm
astoltzfus
one more time, in prose
accel is_assigned_the_value dupls[dupls$a >= threshold or dupls$b >= threshold, ]
where is_assigned_value is the assignment operator in R, and “or” is the pipe operator in R