You are currently browsing the tag archive for the ‘Tabula Sapiens’ tag.

A recent opinion piece titled “A decade of molecular cell atlases” by Stephen Quake narrates the incredible single-cell genomics technology advances that have taken place over the last decade, and how they have translated to increasingly resolved cell atlases. While the article tells some fascinating stories (apparently when hearing a report about the CZI mouse cell atlas Priscilla Chan remarked “why don’t we just do human?” and thus the idea for a human cell atlas was born), it contains several errors and omissions. I have summarized some of them below, and have sent a copy to Trends in Genetics, where the opinion piece was published, requesting corrections [Update April 5, 2022: Trends in Genetics rejected my submission; it is posted on the arXiv]:

  • Quake writes that “The year 2017 marked the release of our Tabula Muris data set and preprint” and cites the 2nd version of a preprint on the bioRxiv posted on March 29, 2018, that later became an article in Nature on October 3, 2018. While it is true that the preprint was first posted on December 20, 2017, at that time the data was not released. The data was only released in the 2nd version of the preprint on March 29, 2018 (the data was made publicly available on GEO at accession GSE109774 on March 19, 2018). Without the data, namely the reads which were processed, it is not possible to verify or reproduce results from a paper, nor, in the case of single-cell transcriptomics, is it possible to build on work by uniformly processing it together with a new dataset for joint analysis. Notably, Quake’s false claim that the preprint and data was released in 2017 is a repeat of what he stated in a lecture titled “The Cell is a Bag of RNA” (an apt analogy of David Lilley). In the lecture Quake specifically said that “in this whole gap [from December 20, 2017 until the paper was published in October 2018] where normally you wouldn’t have access to the paper or the data, the whole world did because we put it on the [bio]arXiv… not just the manuscript but all the data.”
  • The error in describing the date when the Tabula Muris data was shared is significant in light of Quake’s narrative that Tabula Muris was “the first mammalian whole-organism cell atlas”. Quake describes another single-cell RNA-seq based mouse cell atlas published by Guoji Guo‘s group at Zhejiang University on February 22, 2018 (Han et al., Mapping the Mouse Cell Atlas by Microwell-Seq, Cell), as “further work”. Guo’s data was publicly available on GEO on February 14, 2018 (GSE 108097) along with the publication, a date that preceded the release of the Tabula Muris data. In fact, in the Tabula Muris preprint update on March 29, 2018, the Han et al. 2018 paper data is analyzed in conjunction with the Tabula Muris data, with the authors concluding that “independent datasets generated from various atlases that are beginning to arise can be combined and collectively analyzed…”. Thus, it was the Tabula Muris paper that was “further work” following the Han et al. 2018 paper, not the other way around. The timeline that Quake presents is shown on the left (screenshot from his “The Cell is a Bag of RNA” talk); the actual (edits by me in red) timeline is on the right:
  • Quake mischaracterizes another paper from the Guo group, namely Han et al., Construction of a human cell landscape at single-cell level, Nature, 2020. He refers to this paper as one of several that represent “a distinct strategy of compiling cell atlases one tissue at a time.” However the Han et al. 2020 paper analyzed samples of both fetal and adult tissue, and covered 60 human tissue types which were assayed in (2-4) replicates. Han et al. 2020 also examined several types of cell culture, including induced pluripotent stem cells, embryoid body cells, haematopoietic cells derived from co-cultures of human H9 and mouse OP9 cells, and pancreatic beta cells derived from H9 cells. The scope of the Han et al. 2020 paper is apparent in Figure 1 of their paper:
figure 1
  • Quake similarly misrepresents two other human cell atlas papers, namely He et al. Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs, Genome Biology, 2020. This paper, whose title makes clear it mapped cell types in 15 organs, is also described by Quake as “representing a distinct strategy of compiling cell atlases one tissue at a time” implying that only one tissue or organ was assayed. The same is the case for Cao et al., A human cell atlas of fetal gene expression, Science 2020, which derived cell types using single-cell gene expression and chromatin landscape data from 15 organs.
  • When highlighting “his”Tabula Sapiens which was first preprinted in 2021 , Quake fails to mention the human cell atlas papers above, and instead mentions only preprints from the Broad and Sanger which were also published in 2021, and a paper from what he calls a “Swedish consortium” (the work was conceived and designed by Mathias Uhlén and Cecilia Lindskog). This omission makes it seem that Tabula Sapiens was the first human cell atlas to be published, along with a handful of others preprinted at the same time and one published concurrently, when in fact that was not the case.
  • Quake characterizes the Tabula Muris as “representing the first mammalian whole-organism cell atlas.” As noted above, it was not “first”, but priority claims aside, the description as a “whole-organism cell atlas” needs to be qualified. Here is how the project is characterized in the published paper: “Although these data are by no means a complete representation of all mouse organs and cell types, they provide a first draft attempt to create an organism-wide representation of cellular diversity.”
  • In reviewing the technology developments that led to high-throughput single-cell RNA-seq, Quake omits several important advances. There is a large body of work to refer to and cite, including several key advances in barcoding of beads to identify cells and barcoding for distinguishing molecules. For the latter, see, e.g., Shiroguchi et al., 2012 (from the lab of Sunney Xie).
  • The paper declares 2011-2012 to be “seminal years” in conceptualizing the notion of a transcriptomic cell atlas. While it’s true that those were “seminal” years for Quake when he published his own sperm (Wang et al., 2012), the timeline seems arbitrary and possibly self-serving. The Tang et al. paper in 2009 could just as well be taken as the starting point for “conceptualizing the notion of a transcriptomic cell atlas. Tang et al. write specifically that “For example, mouse embryonic stem cells, probably the most thoroughly analyzed type of stem cells, contain multiple subpopulations with strong differences in both gene expression and physiological function. Therefore, a more sensitive mRNA-Seq assay, ideally an assay capable of working at single cell resolution, is needed to meaningfully study crucial developmental processes and stem cell biology.” Similarly, Long et al., in “A 3D digital atlas of C. elegans and its application to single-cell analyses” published in 2009, were anticipating the notion of a transcriptomic cell atlas”, noting that their technology would be particularly useful for “high-throughput analysis of cellular information such as gene expression at single-cell resolution.” Alternatively, a reasonable starting point for consideration could be 2014, when cells actually started to be assayed en masse:
Figure from Svensson et al., 2020.
  • The Long et al. paper brings to the fore the field of spatial transcriptomics, which Quake ignores entirely in his review. However, conceptualization of the notion of a transcriptomic cell atlas was happening by scientists in that field; in fact spatial transcriptomics was arguably the domain where most of the ideas pervasive in single-cell genomics today originated (see, the Museum of Spatial Transcriptomics for a detailed history and review).
  • Another important omission in the Quake opinion is the discussion of the computational biology technologies crucial for cell atlases. None of the Tabula papers, or for that matter any of the single-cell transcriptomics papers that have been written during the past few years would have been possible without the Seurat and Scanpy programs from the Satija and Theis labs respectively. More importantly, the atlases themselves are, fundamentally, a product of the computational tools used to analyze the data. For example, in the Tabula Microcebus the annotated cell types were obtained by analyzing the 10X Genomics single-cell RNA-seq data “through dimensionality reduction by principal component analysis, visualization in 2-dimensional projections with t-Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), and clustering by the Louvain method.” These methods are tuned with numerous parameters and even evaluating when they are accurate is challenging (Kiselev et al., 2019). In regards to dimensionality reduction, there are numerous problems that have been documented with the t-SNE and UMAP leading to questions about results based on interpreting them (see, e.g., Chari et al., 2021, Cooley et al., 2022). The only mention of computational technology in the article is a comment that “Similarly, the development of new algorithms and computational approaches was also a powerful enabler of the field as it now exists.”
  • The omission of computational technology is chalked up to space constraints, yet there was apparently enough space for the Quake to narrate an origin story of the Human Cell Atlas project in which he centers himself, instead of Aviv Regev and Sarah Teichmann whose contribution was much more than to have “asked whether various efforts should be merged into an international collaboration”. They were early champions of a collaborative human cell atlas project and have co-chaired the organizing committee from the outset. Teichmann co-founded the Wellcome Trust Sanger Institute Single Cell Genomics Centre in 2013, and by 2015 had been awarded the EMBO Gold Medal in part for her contributions to, and vision for, single-cell transcriptomics. Regev pioneered many of the single-cell RNA-seq technology developments that enabled single-cell genomics, including single-cell studies of immune cells in 2013 and spatial single-cell RNA-seq in 2015. By the time of the inaugural Human Cell Atlas meeting in London in 2016, Regev had been widely publicizing a vision for a “periodic table of cells” and Teichmann had joined forces with her to develop a joint vision to accomplish the task (see article in the Pacific Standard, 2018).
  • There are a few minor errors in the paper. Quake writes that “These [microfluidics automation technologies] were eventually commercialized by a company I founded called Fluidigm..”. In fact, Quake did not found Fluidigm by himself; the company was co-founded with Gajus Worthington. Miriam Merad’s name is incorrectly spelled as “Meriam” and Christophe Benoist’s name is incorrectly spelled as “Benoiste”. A recurring typo is the misspelling of Sarah Teichmann’s name. It is incorrectly spelled “Teichman” three times throughout the manuscript, including in the Acknowledgment section where she is thanked for specific comments on the manuscript.

A broader point regarding cell atlases is that defining cell types, distinguishing cell types from cell states, and comprehensively organizing cells in any species in a meaningful framework, is a monumental task that we are only beginning to tackle. There are no definitive human or mouse cell atlases yet, and there won’t be for some time. Among the “atlases” published so far there is little consensus. The Tabula Muris, cell atlas annotates far fewer cell types than Han et al., 2018, perhaps because the latter assayed many more cells. Similarly, the fly cell atlas by Li et al., 2021 lists ~250 cell types in comparison to the Tabula Sapiens that finds ~400 in human. Perhaps these similar numbers do not reflect fundamental shared biology or a universal organizing principle for cells, but rather the fact that both projects sequenced similar numbers of cells (~580k vs. 500k respectively). Unsurprisingly, the number of annotated cell types in publications is strongly correlated with the number of cells assayed:

Cluster and cell numbers. The number of cells studied versus the number of clusters or cell types reported in a study. Red curves correspond to linear regression stratified to five quantiles of ‘Reported cells total’.
Figure from Svensson et al., 2020.

The brain presents an especially daunting challenge. An entire recent issue of Nature was devoted to only one region: the primary motor cortex. Frankly, opinion pieces elbowing for priority claims are neither appropriate nor interesting. To the extent that the human cell atlas will ever become a meaningful accomplishment it will have been a project without a single winner. Instead, it will have been a collaborative effort of thousands of scientists from across the world who will have deepened our understanding of biology to the benefit of all.

Blog Stats

  • 2,765,923 views
%d bloggers like this: