A blogpost by Lior Pachter at https://liorpachter.wordpress.com/2013/10/21/gtex/ suggested that GTEx uses a suboptimal transcript quantification method. He showed using simulated data, that the method used by GTEx obtained a quality score that can be reached by a different method using 10-fold less data. This result led to the sensational subject line: “GTEx is throwing away 90% of their data”. We take such concerns very seriously and intend to be transparent to the community regarding our decisions. We stand by the processes and decisions we have taken to date given the data and time constraints but of course always remain open to community suggestions for improvement. We have answered the questions raised by the blogpost below. We thank Lior Pachter for hosting this in his blog.
– Is GTEx LITERALLY throwing away 90% of the data?
Absolutely not! The GTEx project is not throwing away any data at all. In fact, it provides all data, in both raw and processed forms, to the scientific community at dbGAP site (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v3.p1) or on the GTEx portal (http://www.broadinstitute.org/gtex/). GTEx is running multiple tools to perform various analytical steps. The initial analysis of the GTEx pilot project data is completed and frozen and the consortium is in the process of writing a manuscript describing the results. For transcript abundance quantification, which comprises a small fraction of the overall analyses of the pilot data, the consortium used FluxCapacitor and provided the outputs on the GTEx portal. Specifically, these results were used for analyses of splice- and isoform-QTLs and for assessing tissue-specificity of loss-of-function variants.). Tools used in the pilot phase of GTEx will be re-evaluated, based on benchmarking results and large-scale production consideration (see below), and alternative methods, including for transcript quantification, may be employed in the future.
– Is the FluxCapacitor (FC) a published method?
Yes, although not as a standalone methodological paper but as part of a large study where RNAseq data were used to perform eQTL analysis (Montgomery, Nature 2010). That paper was peer reviewed and includes a 1.5 page description of the methodology in the methods section and a supplementary figure (Supplementary Figure 23) describing the method’s outline (http://www.nature.com/nature/journal/v464/n7289/full/nature08903.html). In addition, there is extensive documentation on the web at http://sammeth.net/confluence/display/FLUX/Home supporting the publication and offering download of the software. Indeed, the level of detail of the description of the method in the paper is not as comprehensive as in some standalone methods papers. We thank Lior for pointing this out and note that the documentation has been updated and will continue to be improved by the authors of FC.
– Why did GTEx decide to use the FluxCapacitor method in it’s pilot analyses?
In order to understand the decision to use FluxCapacitor (FC) in GTex, one first needs to understand how such decisions are made in GTEx (see below). Initially we used Cufflinks (CL), which is the most commonly used tool in the field for quantifying isoforms. However, when using it at large scale (1000s of samples) we hit technical problems of large memory use and long compute times. We attempted to overcome these difficulties, and investigated the possibility of parallelizing CL and contacted the CL developers for help. However, the developers advised us that CL could not be parallelized at that point. Due to project timelines, we started investigating alternative methods. The Guigo group has already produced transcript quantifications with the FC on GTEx data and demonstrated biologically meaningful high quality results (http://www.gtexportal.org/home?page=transcriptQuant) and provided the tool and support to test it and install as part of the GTEx production analysis pipeline. Our current experience with FC is that it scales well and can be used in a production setting.
– Are exon level or gene level read counts appropriate for the purposes of GTEx and Geuvadis eQTL analyses?
Another criticism raised in the blogpost was regarding the use of exon-level and gene-level read counts, as opposed to transcript abundance levels, for calculating eQTLs in both the GTEx pilot project as well as in the recently published GEUVADIS paper (Lappalainen et al. Nature, 2013). The eQTL analyses for both projects mainly used exon-level and gene-level expression values but did not use simple read counts, rather they used carefully normalized values after correcting for multiple covariates (as described in Lappalainen et al.). Transcript (or isoform) abundance levels, which are the subject of Lior Pachter’s simulations, require more sophisticated estimation methods (such as FC, CL or others) since one needs to deconvolute the contribution of each isoform to the exon-level signals (as different isoforms often share exons). We deliberately did not use isoform-abundance levels for the major part of the analysis since this is still a very active area of research and new tools continue to be developed. The analyses in Lappalainen et al. using transcript quantifications, such as population differences, were replicated with other approaches with highly consistent results (Fig. 1c, Fig. S14).
Our results suggest that exon-level and gene-level quantifications are much more robust in cis-eQTL analysis. In the GEUVADIS paper we discovered ~7800 genes with exon-level eQTLs, and ~3800 genes with gene-level eQTLs. Initial tests for eQTL discovery using transcript quantifications from FC found <1000 genes with transcript-level eQTLs, and led to or decision to use the much more powerful exon and gene quantifications in that paper. Given the relatively small difference in correlation coefficients described in Lior Pachter’s post between FC and CL (~83% vs. ~92% for 20M fragments), and our previous experience mentioned above, we find it very unlikely that CL or any other transcript quantification method would discover a number of genes with transcript eQTLs anywhere near the 3800 or the 7800 figure. This suggests that transcript quantification methods do not capture biological variation as well as more straightforward exon and gene quantifications.
The issue that was raised in the blog is regarding estimating relative expression levels of isoforms within a single sample and is measured using the Spearman correlation metric. However, for single-gene eQTL, which is the main goal of GTEx pilot analysis, one searches for significant correlations between genotypes and expression levels of a single gene when compared across subjects. This correlation is robust against shifts (ie. adding or subtracting a constant value) and changes in scale (applying a constant factor) of the expression levels. Therefore, normalizing to the alignable territory of the gene (which will introduce the same factor across samples), and ignoring ambiguously mapped reads (which likely introduces a constant shift) will have little or no effect on the resulting eQTLs. We are constantly evaluating further quantification methods, but we believe that the eQTL results calculated as part of the GTEx pilot and the GEUVADIS paper are solid and robust.
– How does GTEx decide on methodologies to be used?
GTEx strives to provide to the community the highest quality raw and derived data as well as the highest quality scientific results. GTEx also operates with clear and rigid timelines and deliverables. Therefore, we prefer to use methods that already have been vetted by the community and can be used in a large-scale production setting. In this particular case the experience with FC as part of the GEUVADIS project was part of the consideration.
Systematic benchmarking of tools is very important and we encourage the community to conduct such benchmarks. Proper benchmarking is not a simple task — one needs to carefully define the benchmarking metrics (which depend on the particular downstream use of the data), often there are not sufficient ground truth data, and simulations can often be deceiving since they don’t reflect real biology or experimental data. Therefore, whenever there are no published benchmarks that use metrics that are relevant to the GTEx project, we have to perform them within the project (prioritized by the impact on the results). One example is a recent comparison of different alignment methods (including TopHat and GEM), which was recently presented at the ASHG conference in Boston.
Although isoform-level quantification is only a minor part of our current analyses, we continue to evaluate several methods (including FC and CL) and welcome any constructive input on this evaluation from the community. The results obtained by Pachter’s simulations suggest that CL outperforms FC (based on the Spearman correlation metric) but this seems to be at odds with our experience (perhaps due to unrealistic assumptions in the simulations). Clearly, further benchmarking is required to better understand the differences between tools and their effect on the final results (http://www.gtexportal.org/home?page=transcriptQuant).
– Is GTEx open to feedback and what are the mechanisms in place?
GTEx welcomes and encourages feedback and interaction with outside investigators to improve the analysis and data production. We offer several mechanisms for interaction: (i) The GTEx datasets are available for download (some require applying for access to protect donor privacy) and have already been shared with over 100 different research groups that carry out their own analyses of the data; (ii) We had a widely-attended international community meeting in June and plan to hold such meetings yearly, giving the opportunity for external groups to share their results; and (iii) we welcome e-mails to GTEXInfo@mail.nih.gov and comments in the GTEx portal (http://www.broadinstitute.org/gtex/). Finally, we are interested in facilitating systematic benchmarking of tools and investigators that are interested in participating in such benchmarks or in defining the evaluation metrics are welcome to contact us.
To summarize, GTEx strives to produce, and make publicly-available in a timely manner, the best possible data and analysis results, within data release and practical limitations. By no means do we feel that all our analyses are the best possible in all aspects, or that we will perform all the different types of analysis one can do with these data. We are open to constructive feedback regarding the tools we use and the analyses we perform. Finally, all data are available to any investigator who desires to perform novel analyses with their own methods and we anticipate that much improved and innovative analyses of the data will emerge with time.
Manolis Dermitzakis, Gad Getz, Krisitn Ardlie, Roderic Guigo for the GTEx consortium
41 comments
Comments feed for this article
October 31, 2013 at 7:16 am
Stephen Turner
Good points. So, who’s going to do the big simulation & spike-in experiments to fully and impartially assess the strengths and weaknesses of approaches for differential gene expression analysis using methods as disparate as transcript quant with cufflinks, express, etc, versus other methods like FC, counting + DESeq/edgeR, DEXseq and other differential exon usage experiments, etc. Like an assemblathon for RNA-seq differential expression analysis.
October 31, 2013 at 9:24 am
Sujai Kumar
The Barton group at Dundee http://www.compbio.dundee.ac.uk presented a terrific study at the UK Genome Science meeting in September 2013 (Nottingham) evaluating 48 RNA-seq replicates of a yeast diff exp study, complete with spikeins (done in collaboration with Edinburgh Genomics – genomics.ed.ac.uk). Not sure where it is being published.
October 31, 2013 at 5:46 pm
Steven Salzberg
I have been following this debate with great interest. I applaud the GTex consortium for releasing large quantities of data for the community to analyze. However, I have to agree with Lior’s primary concerns, that the pre-computed analyses do not use the best methods. Although Manolis Dermitzakis and colleagues make many valid points, in the end their claim that “we prefer to use methods that already have been vetted by the community and can be used in a large-scale production setting” rings false. Somehow they reject the most widely-used methods, which as Lior pointed out have been published in peer-reviewed journals, and somehow their own methods seem to be the ones they settled on.
Their claim that FC has been published also rings hollow. They write “That paper was peer reviewed and includes a 1.5 page description of the methodology in the methods section and a supplementary figure.” Many of us know that (a) methods sections are not reviewed in detail and are often simply ignored, especially when they’re part of a larger paper that is not about the method, and (b) the supplements are even more likely to be ignored. In contrast, multiple methods papers have been published on competing methods. (Disclosure: I am a co-author on some of those papers.)
I think the community is not being well-served by running possibly inferior software on the very large GTex data sets. As Lior points out, and I agree, many researchers are not going to go through the large effort to re-run other methods, and will simply assume that GTex did a careful job and they can trust the results. If this assumption is violated, then attempts to build upon the GTex data might end up with incorrect results.
I would challenge the GTex consortium to run analysis software that is *not* authored by any of their core members on all data sets. If they want to also run their own favorite methods, they could produce two sets of processed data, and then others could choose. I would also point out that it’s not so hard to run someone else’s software in a way that makes it look inferior, and GTex should try hard to make sure that whatever methods they run, even if not their own, are run in an optimized way.
-Steven Salzberg
Johns Hopkins University School of Medicine
November 1, 2013 at 5:34 am
edermitzakis
I think the point raised by Steven Salzberg is very good. However, the specific criticism to GTEx is exaggerated, to say the least. We never said that cufflinks is inferior to flux or that flux is exemplary in terms of publication strategy and documentation. This does not make it inferior either. It was a practical decision given specific timelines and deadlines and in our hands the software appears to produce reasonable results for this phase of the project. Will we keep using flux or only flux for all phases of GTEx? No, and as we said we will be evaluating transcript quantification methods in order to decide which ones are the best for the analysis we plan to do next and for the community.
Of course, we run the methods that many or most people use and have been published (e.g. TopHat for read mapping, matrix-eQTL for eQTL analysis, PEER for covariate correction etc). Some of them may be authored or co-authored by members of the consortium, but this is expected. You really want some of the experts in the field to be part of GTEx. It is not fair to take the one example of flux and generalize to all methodologies.
One has to also realize that a fraction of the GTEx funds went into competitive peer reviewed grants (RFA) for methodology development required to address areas of the project not as methodologically advanced as others (e.g. estimates of tissue specificity of eQTLs, discovering causal variants etc) or for improvement of existing methods. It is therefore expected that a fraction of the methodologies used in the project are new and are expected to be published separately and at the same time as the GTEx pilot paper.
Of course, we will be happy to provide data analyzed by other methods other than the ones we can think of or we used for the pilot paper (for a paper one cannot use all methods for all analysis) and that’s why we welcome feedback on this, on what the community thinks is relevant or important. As described in the response, these mechanisms have been in place for years and to our knowledge there has never been any complaint that we did not respond to such requests. Providing raw or analyzed data to the community has been a very high priority for the consortium and is in the agenda of all our steering committee or analysis meetings.
It does not take any particular “challenge” other that requesting it!
Manolis Dermitzakis,
University of Geneva
November 1, 2013 at 8:39 am
Nicolas Bray
First I’d like to say that I don’t understand the repeated insistence that GTEx is not literally throwing away data. I think it’s obvious to everyone that Lior was not claiming that people were going around deleting files or anything of that nature.
The fact that the people of the future can analyze the data again is not much of a comfort. Steven points out the effort involved in reanalyzing the data as well as the trust that people inevitably place in the analyses done by consortia. I’ll be a bit gauche and ask: supposing someone did put in the effort, computer time, etc., to reanalyze this data, what would they get out of it? The consortium paper will be published in (going out on a limb here) Nature and be cited many times. Any reanalysis paper will be published in a much lower-profile journal (if at all) and hardly even be noticed in comparison. I know we’re supposed to pretend that such incentives don’t exist in the real world but we are all time and resource constrained and this is the reality.
On the subject of simulation, I think everyone in this field is well aware of the dangers here. We are in the business of analyzing the real world, not simulations, and so performance on a simulation is only good to the extent that that simulation reflects some kind of biological reality. However what you call “the Roberts-Pachter schema” is similar in spirit to that used in so many RNA-seq programs for the simple reason that it is not, in fact, some random schema cooked up by Lior and Adam but rather the basic statistical model of RNA-seq. And it is the basic statistical model of RNA-seq for the simple reason that it makes a good deal of sense: reads begin at a position in your transcriptome with frequency equal to that position’s expected frequency in a pool of molecules coming from transcript abundances.
Does this reflect all the biological complications of RNA-seq in the real world? Of course not. But a) it is a baseline model which anyone claiming to do RNA-seq quantification should perform well on and b) the departures from this model are more controversial. If Lior had wanted, he could have used a real “Roberts-Pachter schema”, namely generating reads with bias according to the bias model as used in eXpress. He didn’t do this precisely because the exact nature of bias in RNA-seq and how to model it are topics still being debated.
However I might add something with regards to the statement: “The simulation does not share the objective function optimized by either Flux Capacitor or Cufflinks.” If creators of Flux truly believe that Flux Simulator is a good model of RNA-seq, then I urge them to have Flux Capacitor optimize that objective function as best it can. After all, why would they want to optimize anything else? Models are not reality but, please, do well on your models!
However, to the extent that this comes down to differences in models of simulation, ¬let’s take a look at Flux Simulator. I have to admit that I only today actually decided to find out what it is that Flux Simulator does and I’m afraid that I’m not filled with confidence. Looking at the website section on RNA hydrolysis, I saw first of all that fragment sizes are assumed to follow a Weibull distribution. I found this somewhat surprising since I’d never heard of anyone using such an assumption before (and indeed googling “Weibull RNA-seq” seems to yield only results pertaining to Flux Simulator) so I decided to look at the justification for this in the Flux Simulator paper. There I found a citation of a paper from 1985 (quite early for RNA-seq!) where to it was demonstrated that the Weibull distribution described the size of the product from sonicating lipid vesicles. I really can’t imagine why we should think the same distribution would also describe the results of RNA fragmentation, nor is any justification given. Although perhaps this explains the cryptic statement on the Flux Simulator website: “Shape parameter delta reflects the geometric relation in which random fragmentation is breaking a molecule (e.g. d = 1 corresponds to uniform fragmentation on the linear chain of nucleotides, d = 2 splits uniformly the surface, and d = 3 the volume, etc.”. At least vesicles do have volume…
And going on from there we have this:
“The number of fragments produced from a specific RNA molecule is determined by n = len/E(d_max) where E(d_max) is the expectancy of the most abundant fragment size, computed from h [sic] and the gamma-function Lambda of delta: E(d_max) = eta*Lambda(1/delta + 1)”
Let’s start with the fact that d_max is a number, rather than a random variable so talking about its expectation (I assume that’s what was meant) makes no sense. Then there’s the fact that the formula given is for the mean of the distribution, rather than its mode as claimed. And a molecule of a given length always fragments into the same number of molecules? Bizarre.
Beyond this type of objection, my understanding (which I admit might be flawed because my previous attempts at using Flux Simulator ended in frustration) is that FS generates a set of transcripts, fragments them, and then returns all those fragments as reads. This is fundamentally not a good model for RNA-seq as it is commonly performed. For an experiment using a small number of cells, yes. But virtually all RNA-seq experiments currently being done are performed on the RNA from millions of cells containing hundreds of billions of transcripts. The tens of millions of fragments that are then sequenced will almost always be sampled from distinct transcripts. The output of Flux Simulator has a huge amount of dependence that will not be present in actual data.
At this point, if the differences here come down to the method of simulation I am very happy to side with a basic, intuitive model of RNA-seq rather than whatever it is that Flux Simulator is doing. Yes, as you say, it has been published and used fairly widely. However I think this says less about Flux Simulator itself than it does about the somewhat perverse incentives that exist in computational biology. The developers of eXpress, Cufflinks, and RSEM all have developed their own simulation frameworks but when faced with the effort of going from that to a distribution-ready program along with a publication, however, I think other projects seem more exciting.
As for claims that none of this matters for the inference of eQTLs, I’ll just note that – as has been pointed out many times before – simply looking at counts can be misleading. Or to put it more forcefully: it is incorrect. If a gene has a short isoform and a long isoform then a switch from short-dominant to long-dominant expression will lead to an increase in the number of reads even if the total expression of the gene does not change. Thus we have a false-positive eQTL.
In this case we’ve merely mistaken a splicing eQTL for a gene expression eQTL but in general, I think it’s hard to say what the effect of using an incorrect method will be. Without some serious justification, this makes the apparent method put forth of selecting the analysis method which gives them the highest number of results especially problematic. This is a dangerous game to play (only a couple steps short of tuning parameters to give the results you prefer) and surely one is better off doing something that actually makes sense and letting the results be what they are.
As for the claims that “SpliceQTLs found based on Flux quantifications reflect biological expectations”, well, that’s nice. But results which are enriched for things which you expect to find is the lowest possible bar for the analysis to cross. It is necessary but not nearly sufficient to show that what was done is actually reasonable.
November 4, 2013 at 6:54 pm
Kasper Hansen
Cufflinks may be hard to run for many samples. Nevertheless, it was used on all 922 samples in what I believe is the largest RNA sequencing study to date: http://genome.cshlp.org/content/early/2013/10/02/gr.155192.113.abstract
November 1, 2013 at 11:49 am
Charles Warden
Stephen – FYI, this study does utilize ERCC spike-ins and 1000 genes with qPCR validation for all of the common algorithms
http://genomebiology.com/2013/14/9/R95
November 1, 2013 at 12:02 pm
Jari Niemi
Dermitzakis wrote: “Initially we used Cufflinks (CL), which is the most commonly used tool in the field for quantifying isoforms. However, when using it at large scale (1000s of samples) we hit technical problems of large memory use and long compute times.”
Probably, running Cufflinks on 20% of GTex data would be 5 time faster, would require 5 times less computational resources, and produces 100% better results than FC!
November 3, 2013 at 12:51 pm
Darya Filippova
I am not sure if the same reasoning would apply to RNA-seq data, but, for example, you could not reconstruct a genome 100% better if you used 20% of its sequence (i.e. 5 times less data).
November 1, 2013 at 5:03 pm
Joe Pickrell
Any reanalysis paper will be published in a much lower-profile journal (if at all) and hardly even be noticed in comparison
I think there’s a contradiction here. You’re arguing 1) that GTEx is “throwing away 90% of its data” and this is a Big Deal and 2) if you re-analyzed the data (and presumably used all 100% of the data) you would likely be unable to learn anything publishable, new, or interesting. These things don’t seem totally consistent.
I think you’re implicitly saying that whatever GTEx does is likely going to be good enough to pick up the majority of the interesting signals in the data, so that doing marginally better isn’t worth your time.
Overall I think it’s reasonable to say that GTEx should be using the best methods that exist, and to make suggestions for how they can do better. But these points are getting lost in the hyperbole and mockery.
November 2, 2013 at 12:12 pm
Nicolas Bray
“These things don’t seem totally consistent.”
In a perfect world, this would mean I was wrong.
But in a perfect world, I don’t think we’d be having this conversation to begin with.
November 1, 2013 at 11:09 pm
Christopher Mason
Interesting post Lior, and interesting work. I am hopeful that the view will improve and many of these discussions will be moot once the long cDNA reads become cheaper (PacBio, ONT, obviously not as much 454 any more). They solve many of these problems by getting complete 5′ and 3′ coverage of the genes. Some of this is coming soon from the SEQC/ABRF RNA-seq data sets, and more from ENCODE, so hang tight.
But, for GTEx to say that they didn’t have the compute power is I think a feigned lack of resources – XSEDE and TACC provide a LOT of free resources (if you paid your taxes, then you have already paid for them!) to run huge amounts of RNA-Seq data through their gigantic clusters. Amazon and Google also try to provide a lot of free resources just to get the science PR buzz. We analyzed 40 billion 100×100 PE reads in a few weeks with Trinity on XSEDE for 13 species, and it just took a small six-page proposal to get the access.
http://www.ncbi.nlm.nih.gov/pubmed/23203872
Lots of the source code for many RNA-seq tools are already on these clusters and they love to help people run data through these servers – some like Blacklight have 16TB of RAM and are lots of fun to work with.
Nonetheless, I think GTEx is great for putting the data out there. I have my gripes with dbGAP, but that’s another story…
November 1, 2013 at 11:17 pm
Christopher Mason
One other thing (for Turner) – some of the DEG tools we have just examined and published, but more is needed for transcript-level work still. See here:
http://www.ncbi.nlm.nih.gov/pubmed/24020486
“Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.”
Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D.
Genome Biol. 2013 Sep 10;14(9):R95
November 7, 2013 at 7:40 am
Ido Tamir
This is a great paper. Especially since the preprocessed input data is available on bitbucket and the scripts also look very runnable. Maybe one could add a build bot e.g. jenkins to this and rerun the analysis + figure genaration with additional scripts/parameters?
November 2, 2013 at 12:25 pm
edermitzakis
To Chris Mason: just to clarify regarding compute, the main problem was that cufflinks was unable to process and analyze in a single thread the amount of reads per sample we provided. When we asked the first author of the paper and ex-member of the Broad whether cufflinks can be parallelized for transcript quantification of a single sample, the answer was that it is not possible.
November 2, 2013 at 1:40 pm
Steven Salzberg
Manolis: seriously? You couldn’t run Cufflinks because the number of reads per sample was too much? We routinely run Cufflinks on RNA-seq runs with over 200 million reads per lane, which is a full lane of a HiSeq. It’s hard to believe you don’t have the compute power to run Cufflinks. If you want to make this argument, you should tell us what exactly the data was and how you ran Cufflinks, and what exactly is this hardware that can’t run it.
November 2, 2013 at 1:56 pm
Lior Pachter
Manolis, I really wish to get to the bottom of this compute claim you are making.
Cufflinks certainly has its limitations. That is, in fact, why we wrote eXpress. Figure 2b in the eXpress paper is exactly about this. We went out to 1 billion fragments (for a single quantification). Cufflinks made it out to about 700 million reads, whereas eXpress easily handled a billion (this was all run on human with ENCODE, not RefSeq annotation).
This result was produced on our local group server. The machine has 512Gb RAM, although of course one can easily buy 1Tb these days. It cost us about 20K (with 48 cores, bunch of disk etc.). I’ll admit that I am a bit embarrassed to reveal here what we compute on. These days powerful hardware is often assumed to be a prerequisite for serious science to being done. But I am, after all 50% math, 50% MCB and I advise computational biology students, I don’t manage IT people. We develop our software on modest hardware, partly because we think it makes sense to develop for what our end users will have access to in individual labs. Besides, its been enough so far (we just ran all of your GEUVADIS data on this machine).
GTEx was funded to the tune of $20 million (in the initial round) and they didn’t have access to a machine such as ours? Or is it the case that there is a single sample in GTEx that has more than 700 million reads?
The eXpress paper is here: http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.2251.html
November 2, 2013 at 3:32 pm
gasstationwithoutpumps
Although your criticism of GTEx may be justified, a 512Gb RAM machine is still not common equipment. Most of my computing is done on my Intel Core 2 Duo MacBook Pro, because the desktop machine I have at work is too slow (my son’s $240 Chromebook is 5 times faster than my desktop machine). My desktop machine is twice as fast as the nodes in the cluster that supports the protein-structure prediction server. Sure, I could replace all that old equipment for about $10k, but I don’t have $10k in funding (and if I did, I’d probably want to spend it so that my grad student doesn’t have to TA every quarter).
November 2, 2013 at 3:43 pm
Lior Pachter
Point well taken. My claims of modest hardware were made relative to the resources a consortium has. 512Gb RAM is a lot for a regular (biology) lab, but it is frequently needed in bioinformatics, especially for assembly. So groups that do lots of computation (e.g. for RNA-Seq) do, in my experience, have hardware of the sort I described. Having said that, analysis of a standard dataset with Cufflinks does not require 512Gb RAM. I was talking about needing that for 700 million reads, which is a LOT more than a typical dataset these days. And in the case a user needs more resources, they can always use Amazon EC2. We actually made an AMI a while back to help users who would want to do that so they don’t have to install Cufflinks and related tools/packages themselves.
November 2, 2013 at 4:06 pm
Heng Li (@lh3lh3)
The vast majority of computing nodes at both Broad and Sanger have 32GB or less RAM. They of course have machines with larger RAM, but thousands of jobs from many other projects are competing for them. If RAM is the problem here, Broad is not the best place to run cufflinks. On the other hand, I entirely agree that it is important to run cufflinks or another popular pipeline/tool. Someone in GTEx must have better hardware, I believe.
November 2, 2013 at 2:02 pm
edermitzakis
Of course one can run a single or a small numbers of samples with large number of reads. The problem is doing hundreds or thousands or samples at high depth at a time for production. There was just too much memory used up and the pipeline was crashing. We had the same problem in another project with similar sample size as GTEx in a completely different cluster and pipeline. If you can provide advise on how to resolve this I would be delighted to put you in contact with the right person at the Broad. We will be very happy to have cufflinks as an option again for production.
November 2, 2013 at 2:10 pm
edermitzakis
Lior, we would be delighted to get your help to run it. Let me know if you are available to talk with the relevant people at the Broad.
As for eXpress, we did not attempt to run it since this was a new software for us and in fact published very close to the decision we had to make regarding transcript quantification. But we will surely evaluate it in the second phase of the project.
November 2, 2013 at 5:02 pm
Lior Pachter
Manolis, I looked again more carefully at our results in the eXpress paper.
The 700 million read limit for Cufflinks was reached using only *24Gb* RAM. Heng Li explains that most of the Broad Institute machines have 32Gb RAM (I didn’t know this as I’ve never worked there). So I don’t understand at all why GTEx was unable to run Cufflinks on all of the samples.
If GTEx’s intention is indeed to run Cufflinks on the samples, then I think that more productive than email exchanges with me (I’m sorry, but I’m teaching a 250 student freshman math class this semester), would be for the Broad Institute people to first email: gp-help@broadinstitute.org (the help for Gene Pattern)
I’m not sure if you are familiar with Gene pattern but it is a “powerful genomic analysis platform” developed at the Broad Institute. According to wikipedia it supports 23,000 registered users at over 2900 commercial and non-profit organizations. The RNA-Seq analysis suite for GenePattern is the Tuxedo tools, and in particular Cufflinks:
http://www.broadinstitute.org/cancer/software/genepattern/modules/RNA-seq/
Perhaps the Broad Institute should become a registered user of GenePattern.
November 3, 2013 at 1:43 am
edermitzakis
Lior, you obviously have plenty of time to engage in this rhetoric and immense number of tweets for days now about the intentions or competence of GTEx investigators, but when asked to contribute to resolve a real issue of your software you are too busy.
As you might guess, we are also very busy, but do spend the time to explain all the issues and contribute to the discussion. This is even though we are criticized on impressions and assumptions since we still didn’t have the chance to present a published outcome of the project.
You might say it is not your responsibility to help. But then again if you care so much about GTEx producing the best analysis, and you think we are not running cufflinks properly, why don’t you show us how to do it right? Or this whole thing was just criticism for the criticism of a yet UNPUBLISHED study?
As for the deep technical details, it makes no sense to do this on the blog and I am not the right person to discuss them either, since I don’t know the Broad cluster technical details.
Of course GTEx will look into this again and attempt to put cufflinks and other equivalent tools into production. But it should be on the record that this is our second attempt to involve the cufflinks developers so we should certainly not be accused for not trying.
Everyone that has constructive feedback is very welcome to send it to GTEx and its PIs by e-mail or to participate in our community meetings. And certainly we will be very happy to receive private or public scientific feedback to the publication when it comes out.
Manolis Dermitzakis, University of Geneva
November 3, 2013 at 2:05 am
Jari Niemi
I personally think that using the strategy “in our case it didn’t work well in practice” as main line of defense is quite disappointing. It is almost impossible to prove this statement wrong (there is no raw data available, no details how the tests were done, how many tools were evaluated, what were exactly the evaluation criteria, etc.). I dare to say that anybody outside of GTEx can prove this statement wrong! Also using this kind of strategy as main defense for GTex does not do anybody any favours especially to GTEx project!
Also, the second strategy that is personal attacks (e.g. “you … have plenty of time”, “you might say…”, “you care so much…”) should not happen in scientific world! Please, attack the statements, work, articles, methods but no personal attacks!
November 3, 2013 at 2:07 am
Jari Niemi
typo: I dare to say that NObody outside of GTEx can prove this statement wrong
November 3, 2013 at 2:31 am
edermitzakis
First of all, we presented all the scientific arguments in the response and the comments after that. No point in reiterating them.
My argument is very scientific, not personal: when one engages in such criticism and then once asked to contribute he/she backs out then interpretations are allowed.
Scientists are not anonymous entities but real individuals so the word “you” is not inappropriate.
As for the proof of our statement: it is impossible to prove everything that one says in science when it comes to such details (e.g. hard to demonstrate that one prepared a buffer properly).
There are are three alternatives:
1) we are right and the software cannot run properly and in production mode in the Broad cluster (for whatever real reason);
2) The software could run well but we made a mistake (this is where Lior’s help would be useful);
3) we are lying about the fact that the software does not run properly and in production mode.
Is the suggestion that 3 might be the case?
Such a statement comes with a lot of responsibility on one’s end
November 3, 2013 at 2:34 am
edermitzakis
BTW when one criticizes GTEx they do criticize individuals not an anonymous entity. Is that a personal attack?
November 3, 2013 at 2:41 am
Jari Niemi
Personally, I believe that there is just no enough data for outsiders of GTEx to figure out what happened in “in our case it didn’t work well”!
My guess and totally my guess is that what happened at GTEx was that there were done some MICRO-evaluations at GTEx (that is the tools were tested/evaluated with too few test-datasets and cases, therefore this gave a very distorted result)! But this is my personal opinion!
Nobody is lying here!
November 3, 2013 at 2:43 am
Jari Niemi
> BTW when one criticizes GTEx they do criticize individuals not an anonymous entity. Is that a personal attack?
My opinion is that attacking a GTEx is not a personal attack because is the same as attacking an article published which is not a a personal attack!
I believe “project = article” in this case!
November 3, 2013 at 2:46 am
edermitzakis
Happy to hear this and I agree.
I can assure you that the evaluation of what is efficient to run was not MICRO but was very painful and lengthy both in terms of analysis and in terms of discussion among GTEx investigators. And this applies to all tools not just for transcript quantification.
November 3, 2013 at 2:52 am
Jari Niemi
> I can assure you that the evaluation of what is efficient to run was not MICRO but was very painful and lengthy both in terms of analysis and in terms of discussion among GTEx investigators. And this applies to all tools not just for transcript quantification.
Again myself as an outsider of GTEx is impossible to prove that this not true. This is the fault of this kind of strategy “it our case it didn’t work well”. Whatever I will guess is very easy to prove it not true.
November 3, 2013 at 2:50 am
edermitzakis
>My opinion is that attacking a GTEx is not a personal attack because is the same as attacking an article published which is not a a personal attack!
I believe “project = article” in this case!
Not sure I agree on this one. When one criticizes an article the senior and first authors take the responsibility. It is not anonymous or diluted responsibility.
November 3, 2013 at 3:32 am
Jari Niemi
> Not sure I agree on this one. When one criticizes an article the senior and first authors take the responsibility. It is not anonymous or diluted responsibility.
Let’s agree that we disagree on this!
November 3, 2013 at 3:06 am
edermitzakis
I am afraid that one has to trust the investigators in some cases. There are many things like that in science. At the end of the day, the point is not whether cufflinks or flux could run or not or which one is best. This is really a side issue.
The GTEx consortium has two main responsibilities: 1) to produce and release high quality data; 2) to produce a set of analysis that uses these data to learn important biological insight.
We can comfortably say that we are doing no 1 very well.
For no 2: is the GTEx analysis at the appropriate level for such an important project? We think yes but cannot give you ALL the justification while we are still finalizing analysis interpretation and writing the paper. Given that we have not yet submitted the paper it is very unfair to be criticized on that both on methodology and biological insight. I hope people like the paper when it is published and we are happy to receive feedback.
November 3, 2013 at 3:12 am
edermitzakis
BTW would be nice to know who you are: I can only see a footballer with your name on google 🙂
November 3, 2013 at 3:31 am
Jari Niemi
> BTW would be nice to know who you are: I can only see a footballer with your name on google
I am developing this:
https://code.google.com/p/fusioncatcher/
November 3, 2013 at 4:00 pm
Dario Strbenac
I think an important feature of software is how regularly it is maintained. If the consortium uses a software package developed by a researcher in the consortium, they can force bugs in the software to be fixed quickly.
In contrast, users of Cufflinks have reported some problems like the parameter estimate not being inside of the lower and upper bound of the 95 % confidence interval about six months ago on the Tuxedo Tools Google group, but it still hasn’t been fixed.
Our laboratory stopped using Cufflinks this year because we assumed it was no longer being maintained. I remember there was also an issue in the past where the bias correction was generating expression estimates of infinity for the majority transcripts. A version of Cufflinks that fixed that came out over six months later.
November 3, 2013 at 4:16 pm
Lior Pachter
With all due respect, the developers of Cufflinks (Cole Trapnell, Adam Roberts, Geo Pertea, Loyal Goff and others who have pitched in at times) have worked very hard to fix bugs and add features. There is a complete record on the Cufflinks website. In contrast, no one has even used Flux Capacitor (except the author) so not only have bugs not been fixed, they very likely have not even been discovered.
I’m sorry if you found a bug in Cufflinks and it was not yet fixed. At times things take a while, especially since we get a lot of feedback from many people, and its hard to sometimes tell if it was a bug in Cufflinks, or an issue on the user side. But the main problem with bugs is to find them, and that is best done when thousands of users report issues, not one.
November 3, 2013 at 7:15 pm
Rob Patro
I’d just like to chime in here and say that as far as open-source, academically developed software goes, I find that Cufflinks is exceptionally well-maintained. The first public release of cufflinks was in September of 2009 and the most recent was in April of this year. In that time, there have been over 20 updates. Software like Cufflinks isn’t developed professionally, and there isn’t a team of software engineers who are paid full-time to service user requests and meet ambitious release deadlines. Nevertheless, the Cufflinks developers have been providing bug fixes and feature updates for close to four years now, and I find that a very competitive and respectable track record for software developed and maintained by a group of individuals who are, no doubt, trying to do new research while still supporting the requests of those who chose to adopt their software. That said, the Cufflinks development and distribution process is a product of its (original) times. I wonder if it might not benefit from adopting a newer distribution platform for the code. For example, the code for Bowtie is now hosted on GitHub, and this provides an easy mechanism (pull-requests) for interested, external developers to fix bugs or add features and request that the official developers merge them into the code. Either way, I think you may be hard-pressed to find academically-developed software with as long a shelf-life and as active an update history as Cufflinks.
November 4, 2013 at 6:44 am
Cole Trapnell
Just a quick note about Cufflinks maintenance: Cufflinks, Cuffdiff, and the other tools *are* still under active maintenance. The issue you raised has been fixed internally, and the fix will be released with version 2.2, hopefully in the next few weeks. Pushing up Cufflinks releases takes a lot of work, because with each revision, we need to expand the test suite with new regression tests, run the full suite, and QC the results. This takes a lot of time, and as Rob Patro pointed out, we don’t have much these days.
That issue in particular is in our eyes relatively minor because most of our user’s don’t rely on the confidence intervals for use in downstream analysis, and none of our code, including Cuffdiff, uses them for anything. It was also introduced with version 2.1, so only users that have upgraded to the latest build would see it, and many users are still using older versions. So that bug was not flagged as a showstopper or one that justified a new release all by itself. That said, bugs are annoying, and we try to fix them all.
I am the only one taking care of the Cufflinks and Cuffdiff code right now, and I have a number of other projects (many of which include tissue culture and other wetwork). I expect my schedule to free up by the end of the month, when I can put some serious time in for a new release.
In any case, I’m sorry that this has taken so long, and appreciate your patience.