You are currently browsing the monthly archive for January 2019.

The title of this blog post is a phrase coined by Paul Wouters and Rodrigo Costas in their 2012 publication Users, Narcissism and Control—Tracking the Impact of Scholarly Publications in the 21st Century. By “technologies of narcissism”, Wouters and Costas mean tools that allow individuals to rapidly assess the impact, usage and influence of their publications without much effort.  One of the main points that Wouters and Costas try to convey is that individuals using technologies of narcissism must exercise “great care and caution” due to the individual level focus of altmetrics.

I first recall noticing altmetrics associated to one of my papers after publishing the paper “Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities” in PLoS Computational Biology in 2005. Public Library of Science (PLoS) was one of the first publishers to collect and display views and downloads of papers, and I remember being intrigued the first time I noticed my paper statistics. Eventually I developed the habit of revisiting the paper website frequently to check “how we were doing”. I’m probably responsible for at least a few dozen of the downloads that led to the paper making the “top ten” list the following year. PLoS Computational Biology even published a paper where they displayed paper rankings (by downloads). Looking back, while PLoS was a pioneer in developing technologies of narcissism, it was the appetite for them from individuals such as myself that drove a proliferation of new metrics and companies devoted to disseminating them. For example, a few years later in 2012, right when Wouters and Costas were writing about technologies of narcissism, Altmetric.com was founded and today is a business with millions of dollars in revenue, dozens of employees, and a name that is synonymous with the metrics they measure.

Today Altmetric.com’s “Attention Score”  is prominently displayed alongside articles in many journals (for example all of the publications of the Microbial Society) and even bioRxiv displays them. In fact, the importance of altmetrics to bioRxiv is evident as they are mentioned in the very first entry of the bioRxiv FAQ, which states that “bioRxiv provides usage metrics for article views and PDF downloads, as well as altmetrics relating to social media coverage. These metrics will be inaccurate and underestimate actual usage in article-to-article comparisons if an article is also posted elsewhere.” Altmetric.com has worked hard to make it easy for anyone to embed the “Altmetric Attention Score” on their website, and in fact some professors now do so:

What does Altmetric.com measure? The details may surprise you. For example, Altmetric tracks article mentions in Wikipedia, but only in the English, Finnish and Swedish Wikipedias. Tweets, retweets and quoted tweets of articles are tracked, but not all of them. This is because Altmetric.com was cognizant of the possibility of “gaming the system” from the outset and it therefore looks for “evidence of gaming” to try to defend against manipulation of its scores. The “Gaming altmetrics” blogpost by founder Euan Adie in 2013 is an interesting read. Clearly there have been numerous attempts to manipulate the metrics they measure. He writes “We flag up papers this way and then rely on manual curation (nothing beats eyeballing the data) to work out exactly what, if anything, is going on.”

Is anything going on? It’s hard to say. But here are some recent comments on social media:

Perhaps this exchange is tongue-in-cheek but notice that by linking to a bioRxiv preprint the second tweet in the series above actually affected the Altmetric Attention Score of that preprint:

Here is another exchange:

Apparently they are not the first to have this idea:

The last tweet is by a co-founder of bioRxiv. These recent “jokes” (I suppose?) about altmetrics are in response to a recent preprint by Abdill and Blekhman (note that I’ve just upped the Altmetric Attention Score of the preprint by linking to it from this blog; Altmetric.com tracks links from manually curated lists of blogs). The Abdill-Blekhman preprint included an analysis showing a strong correlation between paper downloads and the impact factor of journals where they are published:

The analogous plot showing the correlation between tweets and citations per preprint (averaged by journal where they ended up being published) was made by Sina Booeshaghi and Lynn Yi last year (GitHub repository here):

There are some caveats to the Booeshaghi-Yi analysis (the number of tweets per preprint is capped at 100) but it shows a similar trend to the Abdill-Blekhman analysis. One question which these data raise (and the convenient API by Abdill and Blekhman makes possible to study) is what is the nature of this correlation? Are truly impactful results in preprints being recognized as such (independently) by both journal reviewers/editors and scientists via twitter, or are the altmetrics of preprints directly affecting where they end up being published? The latter possibility is disturbing if true. Twitter activity is highly biased and associated with many factors that have nothing to do with scientific interest or relevance. For example, women are less influential on twitter than men (men are twice as likely to be retweeted as women). Thus, the question of causation in this case is not just of academic interest, it is also of career importance for individuals, and important for science as a whole. The data of Abdill and Blekhman will be useful in studying the question, and are an important starting point to assimilate and build on. I am apparently not the only person who thinks so; Abdill and Blekhman’s preprint is already “highly downloaded”, as can be seen on a companion website to the preprint called Rxivist.

The Rxivist website is useful for browsing the bioRxiv, but it does something else as well. For the first time, it makes accessible two altmetric statistics (paper downloads and tweets) via “author leaderboards”. Unlike Altmetric.com, which appears to carefully (and sometimes) manually curates the statistics going into its final score, and which acts against manipulation of tweets by filtering for bots, the Rxivist leaderboards are based on raw data. This is what a leaderboard of “papers downloaded” looks like:

The fact is that Stephen Floor is right; it is already accepted that the number of times a preprint has been downloaded is relevant and important:

But this raises a question, are concerns about gaming the system overblown? A real problem? How hard is it, really, to write a bot to boost one’s download statistics? Has someone done it already?

Here is a partial answer to the questions above in the form of a short script that downloads any preprint (also available on the blog GitHub repository where the required companion chromedriver binary is also available):

The scientific community would be remiss to ignore the proliferation of technologies of narcissism. These technologies can have real benefit, primarily by providing novel ways to help researchers identify interesting and important work that is relevant to the questions they are pursuing. Furthermore, one of the main advantages of the open licensing of resources such as bioRxiv or PLoS is that they permit natural language processing to facilitate automatic prioritization of articles, search for relevant literature, and mining for specific scientific terms (see e.g., Huang et al. 2016). But I am loathe to accept a scientific enterprise that rewards winners of superficial, easily gamed, popularity contests.

The long-standing practice of data sharing in genomics can be traced to the Bermuda principles, which were formulated during the human genome project (Contreras, 2010). While the Bermuda principles focused on open sharing of DNA sequence data, they heralded the adoption of other open source standards in the genomics community. For example, unlike many other scientific disciplines, most genomics software is open source and this has been the case for a long time (Stajich and Lapp, 2006). The open principles of genomics have arguably greatly accelerated progress and facilitated discovery.

While open sourcing has become de rigueur in genomics dry labs, wet labs remain beholden to commercial instrument providers that rarely open source hardware or software, and impose draconian restrictions on instrument use and modification. With a view towards joining others who are working to change this state of affairs, we’ve posted a new preprint in which we describe an open source syringe pump and microscope system called poseidon:

A. Sina Booeshaghi, Eduardo da Veiga Beltrame, Dylan Bannon, Jase Gehring and Lior Pachter,

The poseidon system consists of

• A syringe pump that can operate at a wide range of flow rates. The bulk cost per pump is \$37.91. A system of three pumps that can be used for droplet based single-cell RNA-seq experiments can be assembled for \$174.87
• A microscope system that can be used to evaluate the quality of emulsions produced using the syringe pumps. The cost is \$211.69.
• Open source software that can be used to operate four pumps simultaneously, either via a Raspberry Pi (that is part of the microscope system) or directly via a laptop/desktop.

Together, these components can be used to build a Drop-seq rig for under \$400, or they can be used piecemeal for a wide variety of tasks. Along with describing benchmarks of poseidon, the preprint presents design guidelines that we hope can accelerate both development and adoption of open source bioinstruments. These were developed while working on the project; some were borrowed from our experience with bioinformatics software, while others emerged as we worked out kinks during development. As is the case with software, open source is not,  in and of itself, enough to make an application usable.  We had to optimize many steps during the development of poseidon, and in the preprint we illustrate the design principles we converged on with specific examples from poseidon.

The complete hardware/software package consists of the following components:

We benchmarked the system thoroughly and it has similar performance to a commercial Harvard Apparatus syringe pump; see panel (a) below. The software driving the pumps can be used for infusion or withdrawl, and is easily customizable. In fact, the ability to easily program arbitrary schedules and flow rates without depending on vendors who frequently charge money and require firmware upgrades for basic tasks, was a major motivation for undertaking the project. The microscope is basic but usable for setting up emulsions. Shown in panel (b) below is a microfluidic droplet generation chip imaged with the microscope. Panel (c) shows that we have no trouble generating uniform emulsions with it.

Together, the system constitutes a Drop-seq rig (3 pumps + microscope) that can be built for under \$400:

We did not start the poseidon project from scratch. First of all, we were fortunate to have some experience with 3D printing. Shortly after I started setting up a wet lab, Shannon Hateley, a former student in the lab, encouraged me to buy a 3D printer to reduce costs for basic lab supplies. The original MakerGear M2 we purchased has served us well saving us enormous amounts of money just as Shannon predicted, and in fact we now have added a Prusa printer:

The printer Shannon introduced to the lab came in handy when, some time later, after starting to perform Drop-seq in the lab, Jase Gehring became frustrated with the rigidity commercial syringe pumps he was using. With a 3D printer available in-house, he was able to print and assemble a published open source syringe pump design. What started as a hobby project became more serious when two students joined the lab: Sina Booeshaghi, a mechanical engineer, and  Eduardo Beltrame, an expert in 3D printing. We were also encouraged by the publication of a complete Drop-seq do-it-yourself design from the Satija lab. Starting with the microscope device from the Stephenson et al. paper, and the syringe pump from the Wijnen et al. paper, we worked our way through numerous hardware design optimizations and software prototypes. The photo below shows the published work we started with at the bottom, the final designs we ended up with at the top, and intermediate versions as we converged on design choices:

In the course of design we realized that despite a lot of experience developing open source software in the lab, there were new lessons we were learning regarding open-source hardware development, and hardware-software integration. We ended up formulating six design principles that we explain in detail in the preprint via example of how they pertained to the poseidon design:

We are hopeful that these principles we adhered to will serve as useful guidelines for others interested in undertaking open source bioinstrumentation projects.

This post is a review of a recent preprint on an approach to testing for RNA-seq gene differential expression directly from transcript compatibility counts:

Marek Cmero, Nadia M Davidson and Alicia Oshlack, Fast and accurate differential transcript usage by testing equivalence class counts, bioRxiv 2018.

To understand the preprint two definitions are important. The first is of gene differential expression, which I wrote about in a previous blog post and is best understood, I think, with the following figure (reproduced from Supplementary Figure 1 of Ntranos, Yi, et al., 2018):

In this figure, two isoforms of a hypothetical gene are called primary and secondary, and two conditions in a hypothetical experiment are called “A” and “B”. The black dots labeled conditions A and B have x-coordinates $x_A$ and $x_B$ corresponding to the abundances of the primary isoform in the respective conditions, and y-coordinates $y_A$ and $y_B$ corresponding to the abundance of the secondary isoforms. In data from the hypothetical experiment, the black dots represent the mean level of expression of the constituent isoforms as derived from replicates. Differential transcript expression (DTE) refers to change in one of the isoforms. Differential gene expression (DGE) refers to change in overall gene expression (i.e. expression as the sum of the expression of the two isoforms). Differential transcript usage (DTU) refers to change in relative expression between the two isoform and gene differential expression (GDE) refers to change in expression along the red line. Note that DGE, DTU and DGE are special cases of GDE.

The Cmero et al. preprint describes a method for testing for GDE, and the method is based on comparison of equivalence classes of reads between conditions. There is a natural equivalence relation $\sim$ on the set of reads in an RNA-seq experiment, where two reads $r_1$ and $r_2$ are related by $\sim$ when $r_1$ and $r_2$ align (ambiguously) to exactly the same set of transcripts (see, e.g. Nicolae et al. 2011). The equivalence relation $\sim$ partitions the reads into equivalence classes, and, in a slight abuse of notation, the term “equivalence class” in RNA-seq is used to denote the set of transcripts corresponding to an equivalence class of reads. Starting with the pseudoalignment program kallisto published in Bray et al. 2016, it became possible to rapidly obtain the (transcript) equivalence classes for reads from an RNA-seq experiment.

In previous work (Ntranos et al. 2016) we introduced the term transcript compatibility counts to denote the cardinality of the (read) equivalence classes. We thought about this name carefully; due to the abuse of notation inherent in the term “equivalence class” in RNA-seq, we felt that using “equivalence class counts” would be confusing as it would be unclear whether it refers to the cardinalities of the (read) equivalence classes or the (transcript) “equivalence classes”.

With these definitions at hand, the Cmero et al.’s preprint can be understood to describe a method for identifying GDE between conditions by directly comparing transcript compatibility counts. The Cmero et al. method is to perform Šidák aggregation of p-values for equivalence classes, where the p-values are computed by comparing transcript compatibility counts for each equivalence class with the program DEXSeq (Anders et al. 2012). A different method that also identifies GDE by directly comparing transcript compatibility counts was previously published by my student Lynn Yi in Yi et al. 2018. I was curious to see how the Yi et al. method, which is based on Lancaster aggregation of p-values computed from transcript compatibility counts compares to the Cmero et al. method. Fortunately it was really easy to find out because Cmero et al. released code with their paper that can be used to make all of their figures.

I would like to note how much fun it is to reproduce someone else’s work. It is extremely empowering to know that all the methods of a paper are pliable at the press of a button. Below is the first results figure, Figure 2, from Cmero et al.’s paper:

Below is the same figure reproduced independently using their code (and downloading the relevant data):

It’s beautiful to see not only apples-to-apples, but the exact same apple! Reproducibility is obviously important to facilitate transparency in papers and to ensure correctness, but its real value lies in the fact that it allows for modifying and experimenting with methods in a paper. Below is the second results figure, Figure 3, from Cmero et al.’s paper:

The figure below is the reproduction, but with an added analysis in Figure 3a, namely the method of Yi et al. 2018 included (shown in orange as “Lancaster_equivalence_class”).

The additional code required for the extra analysis is just a few lines and can be downloaded from the Bits of DNA Github repository:

library(aggregation)
library(dplyr)
dm_dexseq_results <- as.data.frame(DEXSeqResults(dm_ec_results\$dexseq_object))
dm_lancaster_results <- dm_dexseq_results %>% group_by(groupID) %>% summarize(pval = lancaster(pvalue, log(exonBaseMean)))
dm_lancaster_results\$gene_FDR <- p.adjust(dm_lancaster_results\$pval, ‘BH’)
dm_lancaster_results <- data.frame(gene = dm_lancaster_results\$groupID,
FDR = dm_lancaster_results\$gene_FDR)

hs_dexseq_results <- as.data.frame(DEXSeqResults(hs_ec_results\$dexseq_object))
hs_lancaster_results <- hs_dexseq_results %>% group_by(groupID) %>% summarize(pval = lancaster(pvalue, log(exonBaseMean)))
hs_lancaster_results\$gene_FDR <- p.adjust(hs_lancaster_results\$pval, ‘BH’)
hs_lancaster_results <- data.frame(gene = hs_lancaster_results\$groupID,
FDR = hs_lancaster_results\$gene_FDR)

A zoom-in of Figure 3a below shows that the improvement of Yi et al.’s method in the hsapiens dataset over the method of Cmero et al. is as large as the improvement of aggregation (of any sort) over GDE based on transcript quantifications. Importantly, this is a true apples-to-apples comparison because Yi et al.’s method is being tested on exactly the data and with exactly the metrics that Cmero et al. chose:

The improvement is not surprising; an extensive comparison of Lancaster aggregation with Šidák aggregation is detailed in Yi et al. and there we noted that while Šidák aggregation performs well when transcripts are perturbed independently, it performs very poorly in the more common case of correlated effect. Furthermore, we also examined in detail DEXSeq’s aggregation (perGeneQvalue) which appears to be an attempt to perform Šidák aggregation but is not quite right, in a sense we explain in detail in Section 2 of the Yi et al. supplement. While DEXSeq’s implementation of Šidák aggregation does control the FDR, it will tend to report genes with many isoforms and consumes the “FDR budget” faster than Šidák aggregation. This is one reason why, for the purpose of comparing Lancaster and Šidák aggregation in Yi et al. 2018, we did not rely on DEXSeq’s implementation of Šidák aggregation. Needless to say, separately from this issue, as mentioned above we found that Lancaster aggregation substantially outperforms Šidák aggregation.

The figures below complete the reproduction of the results of Cmero et al. The reproduced figures are are very similar to Cmero et al.’s figures but not identical. The difference is likely due to the fact that the Cmero paper states that a full comparison of the “Bottomly data” (on which these results are based) is a comparison of 10 vs. 10 samples. The reproduced results are based on downloading the data which consists of 10 vs. 11 samples for a total of 21 samples (this is confirmed in the Bottomly et al. paper which states that they “generated single end RNA-Seq reads from 10 B6 and 11 D2 mice.”) I also noticed one other small difference in the Drosophila analysis shown in Figure 3a where one of the methods is different for reasons I don’t understand. As for the supplement, the Cmero et al. figures are shown on the left hand side below, and to their right are the reproduced figures:

The final supplementary figure is a comparison of kallisto to Salmon: the Cmero et al. paper shows that Salmon results are consistent with kallisto results shown in Figure 3a,  and reproduces the claim I made in a previous blog post, namely that Salmon results are near identical to kallisto:

The final paragraph in the discussion of Cmero et al. states that “[transcript compatibility counts] have the potential to be useful in a range of other expression analysis. In particular [transcript compatibility counts] could be used as the initial unit of measurement for many other types of analysis such as dimension reduction visualizations, clustering and differential expression.” In fact, transcript compatibility counts have already been used for all these applications and have been shown to have numerous advantages. See the papers:

Many of these papers were summarized in a talk I gave at Cold Spring Harbor in 2017 on “Post-Procrustean Bioinformatics”, where I emphasized that instead of fitting methods to the predominant data types (in the case of RNA-seq, gene counts), one should work with data types that can support powerful analysis methods (in the case of RNA-seq, transcript compatibility counts).