[UPDATE January 21, 2023: Professor Ting Guo has been fired from UC Davis]

The UC Davis Young Scholars Program is a summer residential program that provides high school students the opportunity to work one-on-one with research faculty in state-of-the-art labs for six weeks. One of the faculty mentors that the program recently showcased on its Facebook page is Chemistry Professor Ting Guo, who has been a faculty mentor in the program for many years.

Professor Guo, who was the chairman of the UC Davis Chemistry Department from 2016-2018, has been mentoring high school students for over a decade. Already in 2010, he was awarded the Chancellor’s Achievement Award for Diversity and Community by then Chancellor Linda Katehi. In January of that year, he started mentoring a high school student, who had written to him asking whether she could shadow him at work for an assignment from her AP Chemistry teacher. She had written to several professors and he was the first to reply affirmatively.

Warning: what follows contains descriptions of violence, sexual assault, and other traumatic events. You can read a summary by skipping to “The end“.

In 2018, the high school student from 2010 who had shadowed Professor Guo for an AP Chemistry assignment, and was by then at UC Santa Barbara, contacted a USCB Police Department detective to report that she had been repeatedly sexually assaulted by Professor Guo in 2010. This is detailed in a lawsuit (CV2020-1704) filed by the student against Professor Guo and the Board of Regents of the University of California, Davis. The filing describes an alleged incident on August 7, 2010, where the high school student (now the plaintiff) presented Professor Guo with some gifts (per her cultural custom) and offered to help him carry them home. At his house he allegedly offered her beer (which she declined because she was underage), and they apparently talked about Star Wars and his complete collection of the movies. Below is an excerpt from her statement to the UCSB police that is reproduced in the filing:

The plaintiff alleges that a few months later, by September 2010, she had been sexually assaulted three times:

The details are painful and poignant. After the second assault he allegedly offered her $60: “She refused and felt disrespected. But then he said to give it to her mom.” And as is often the case when massive power differentials are at play, the victim “carried on like normal- like nothing strange had happened because she did not want to face it or deal with it or process it. The plaintiff didn’t want to believe that Professor Guo was that kind of person.” I was heartbroken reading the following passage describing the plaintiff’s frame of mind after the first sexual assault: The plaintiff was also scared: The allegation that “he had spanked her in the past” is elaborated on in the filing: According to the filing, the report that was filed with police at UCSB followed therapy sessions and a meeting with a CARE counselor at UCSB. It included not only a statement by the victim, but text messages with friends about the events when they happened. The UCSB police forwarded the report to police at UC Davis, who spoke to Professor Guo. He denied anything had happened. Turning a blind eye You might think, that UC Davis, which became aware of the allegations in 2018 when the UCSB police report was forwarded to the UC Davis police, and which certainly reviewed the allegations in the lawsuit filed in 2020, would at least protect high school students by not allowing Professor Guo to interact with them until the truth, or falsehood, of the allegations against him could be established. At universities, investigations of allegations against a professor can take a long time, and it is understandable that a university would afford professors a presumption of innocence until determination of guilt or innocence is complete (although to be clear, the timescale of investigations is frequently not reasonable at all). In any case, the possibility of guilt in a case where serious allegations of violence and sexual assault are alleged, demand protection of students in the interim. Protection, at a minimum, would entail not allowing Professor Guo to mentor high school students and refuse him the privilege of serving as a mentor in the Young Scholars Program. This would be a limitation, but not one that is very restrictive for a professor. Of course, one would hope that UC Davis would also protect undergraduate students, graduate students and postdocs, but again, at least, one would hope, UC Davis would protect high school students. However, UC Davis allowed Professor Guo to continue mentoring high school students up until 2021, as the Facebook post shown at the top of this post demonstrates. In fact, Professor Guo mentored a high school student by the name of Jonathan Ma in 2019, after UC Davis knew about the allegations against Professor Guo. Below is an excerpt from an article in the the St. Louis Post-Dispatch dated July 22, 2019 about the student and his summer experience in Professor Ting Guo’s lab: Tampering with evidence In 2019 California changed the statute of limitations for adult survivors of sexual abuse from 3 years to 10 years. Assaults that occurred before January 1, 2019, can be held to the three-year limit. For this reason, the court sustained demurrers by Ting Guo and the Board of Regents of the University of California against the plaintiff in the CV2020-1704 lawsuit. There will be no trial to establish the truth or falsehood of the allegations. Now suppose you were an administrator at UC Davis, and you believed that the allegations against Professor Ting Guo were FALSE. Suppose you believed that Professor Ting Guo was INNOCENT. Why would you tamper with websites simply showing that Professor Guo regularly mentored high school students via the UC Davis Youth Scholars Program? After all, you would believe him to be an INNOCENT man… so what would there be to hide? Well…it turns out that recently websites of the Youth Scholars Program were tampered with to remove all evidence of Professor Ting Guo’s involvement with the program 👀 For example, consider student Sean Wu who participated in the Youth Scholars Program in 2018, and was mentored by Professor Ting Guo: This screenshot is from a November 3, 2022 cache of a Youth Scholars Program website taken at 13:23:23 GMT describing the research projects undertaken in 2018 (the link is to a copy on the Wayback machine). Today, the website looks like this: The project by Sean Wu in Professor Ting Guo’s lab has simply been… deleted. On another Youth Scholars Program website, the project is still listed, but the mentor has been changed from Professor Guo to Jennifer Lien, who is a postdoc in the Guo lab (she was formerly a graduate student in the lab and has been there 11 years): Several other Youth Scholars Program high school students who worked in Professor Guo’s lab, and that had previously listed him as their mentor on the Youth Scholars Program websites, have just had their mentor retroactively changed to Jennifer Lien by edits to the website. These include Jonathan Ma (the student from 2019 who is mentioned above), and another student Susan Garcia (2017). I wonder who chose Jennifer Lien to replace Ting Guo as the mentor of the students, in some cases more than 5 years after the fact. Susan Garcia’s project was also deleted from this website. In fact, a page dedicated to her project now returns an “Access denied” error: This page existed previously, as evident from a Google search which shows it hosted the abstract for the work (other abstracts from that year are all available on functioning websites): In addition, the Facebook post shown at the top of this post, was also deleted. The cover up was sloppy (the need to scrub Professor Guo’s website was seemingly overlooked [UPDATE January 21, 2023: the website has now been removed]), but whoever did this clearly wanted to hide the fact that Professor Ting Guo mentored high school students via the Youth Scholars Program. The digital tampering that was performed is reminiscent of one of the scandals that led to Chancellor Linda Katehi’s resignation “under fire” in 2016, when she was being investigated for using university money to try to remove negative online search results about herself. Seriously, what is going on at UC Davis? The end In summary, a high school student working in UC Davis Chemistry professor Ting Guo’s lab in 2010 alleged in a police report filed in 2018 that she was sexually assaulted by him multiple times. In 2020, she filed a lawsuit against Professor Guo and The Board of Regents of the University of California, Davis. UC Davis continued to allow Professor Guo access to high school students via the Youth Scholars Program even after finding out about the serious allegations against him. Recently, websites of the Youth Scholars Program have been altered or deleted to remove any evidence showing that Professor Guo was ever a mentor in the program. How many more such cases are there that have not see the light of day because evidence was more effectively tampered with? How many universities are wiping their records to hide evidence of their negligence in protecting students? How many more women must suffer? Will we ever see the end? A recent opinion piece titled “A decade of molecular cell atlases” by Stephen Quake narrates the incredible single-cell genomics technology advances that have taken place over the last decade, and how they have translated to increasingly resolved cell atlases. While the article tells some fascinating stories (apparently when hearing a report about the CZI mouse cell atlas Priscilla Chan remarked “why don’t we just do human?” and thus the idea for a human cell atlas was born), it contains several errors and omissions. I have summarized some of them below, and have sent a copy to Trends in Genetics, where the opinion piece was published, requesting corrections [Update April 5, 2022: Trends in Genetics rejected my submission; it is posted on the arXiv]: • Quake writes that “The year 2017 marked the release of our Tabula Muris data set and preprint” and cites the 2nd version of a preprint on the bioRxiv posted on March 29, 2018, that later became an article in Nature on October 3, 2018. While it is true that the preprint was first posted on December 20, 2017, at that time the data was not released. The data was only released in the 2nd version of the preprint on March 29, 2018 (the data was made publicly available on GEO at accession GSE109774 on March 19, 2018). Without the data, namely the reads which were processed, it is not possible to verify or reproduce results from a paper, nor, in the case of single-cell transcriptomics, is it possible to build on work by uniformly processing it together with a new dataset for joint analysis. Notably, Quake’s false claim that the preprint and data was released in 2017 is a repeat of what he stated in a lecture titled “The Cell is a Bag of RNA” (an apt analogy of David Lilley). In the lecture Quake specifically said that “in this whole gap [from December 20, 2017 until the paper was published in October 2018] where normally you wouldn’t have access to the paper or the data, the whole world did because we put it on the [bio]arXiv… not just the manuscript but all the data.” • The error in describing the date when the Tabula Muris data was shared is significant in light of Quake’s narrative that Tabula Muris was “the first mammalian whole-organism cell atlas”. Quake describes another single-cell RNA-seq based mouse cell atlas published by Guoji Guo‘s group at Zhejiang University on February 22, 2018 (Han et al., Mapping the Mouse Cell Atlas by Microwell-Seq, Cell), as “further work”. Guo’s data was publicly available on GEO on February 14, 2018 (GSE 108097) along with the publication, a date that preceded the release of the Tabula Muris data. In fact, in the Tabula Muris preprint update on March 29, 2018, the Han et al. 2018 paper data is analyzed in conjunction with the Tabula Muris data, with the authors concluding that “independent datasets generated from various atlases that are beginning to arise can be combined and collectively analyzed…”. Thus, it was the Tabula Muris paper that was “further work” following the Han et al. 2018 paper, not the other way around. The timeline that Quake presents is shown on the left (screenshot from his “The Cell is a Bag of RNA” talk); the actual (edits by me in red) timeline is on the right: • Quake mischaracterizes another paper from the Guo group, namely Han et al., Construction of a human cell landscape at single-cell level, Nature, 2020. He refers to this paper as one of several that represent “a distinct strategy of compiling cell atlases one tissue at a time.” However the Han et al. 2020 paper analyzed samples of both fetal and adult tissue, and covered 60 human tissue types which were assayed in (2-4) replicates. Han et al. 2020 also examined several types of cell culture, including induced pluripotent stem cells, embryoid body cells, haematopoietic cells derived from co-cultures of human H9 and mouse OP9 cells, and pancreatic beta cells derived from H9 cells. The scope of the Han et al. 2020 paper is apparent in Figure 1 of their paper: • Quake similarly misrepresents two other human cell atlas papers, namely He et al. Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs, Genome Biology, 2020. This paper, whose title makes clear it mapped cell types in 15 organs, is also described by Quake as “representing a distinct strategy of compiling cell atlases one tissue at a time” implying that only one tissue or organ was assayed. The same is the case for Cao et al., A human cell atlas of fetal gene expression, Science 2020, which derived cell types using single-cell gene expression and chromatin landscape data from 15 organs. • When highlighting “his”Tabula Sapiens which was first preprinted in 2021 , Quake fails to mention the human cell atlas papers above, and instead mentions only preprints from the Broad and Sanger which were also published in 2021, and a paper from what he calls a “Swedish consortium” (the work was conceived and designed by Mathias Uhlén and Cecilia Lindskog). This omission makes it seem that Tabula Sapiens was the first human cell atlas to be published, along with a handful of others preprinted at the same time and one published concurrently, when in fact that was not the case. • Quake characterizes the Tabula Muris as “representing the first mammalian whole-organism cell atlas.” As noted above, it was not “first”, but priority claims aside, the description as a “whole-organism cell atlas” needs to be qualified. Here is how the project is characterized in the published paper: “Although these data are by no means a complete representation of all mouse organs and cell types, they provide a first draft attempt to create an organism-wide representation of cellular diversity.” • In reviewing the technology developments that led to high-throughput single-cell RNA-seq, Quake omits several important advances. There is a large body of work to refer to and cite, including several key advances in barcoding of beads to identify cells and barcoding for distinguishing molecules. For the latter, see, e.g., Shiroguchi et al., 2012 (from the lab of Sunney Xie). • The paper declares 2011-2012 to be “seminal years” in conceptualizing the notion of a transcriptomic cell atlas. While it’s true that those were “seminal” years for Quake when he published his own sperm (Wang et al., 2012), the timeline seems arbitrary and possibly self-serving. The Tang et al. paper in 2009 could just as well be taken as the starting point for “conceptualizing the notion of a transcriptomic cell atlas. Tang et al. write specifically that “For example, mouse embryonic stem cells, probably the most thoroughly analyzed type of stem cells, contain multiple subpopulations with strong differences in both gene expression and physiological function. Therefore, a more sensitive mRNA-Seq assay, ideally an assay capable of working at single cell resolution, is needed to meaningfully study crucial developmental processes and stem cell biology.” Similarly, Long et al., in “A 3D digital atlas of C. elegans and its application to single-cell analyses” published in 2009, were anticipating the notion of a transcriptomic cell atlas”, noting that their technology would be particularly useful for “high-throughput analysis of cellular information such as gene expression at single-cell resolution.” Alternatively, a reasonable starting point for consideration could be 2014, when cells actually started to be assayed en masse: • The Long et al. paper brings to the fore the field of spatial transcriptomics, which Quake ignores entirely in his review. However, conceptualization of the notion of a transcriptomic cell atlas was happening by scientists in that field; in fact spatial transcriptomics was arguably the domain where most of the ideas pervasive in single-cell genomics today originated (see, the Museum of Spatial Transcriptomics for a detailed history and review). • Another important omission in the Quake opinion is the discussion of the computational biology technologies crucial for cell atlases. None of the Tabula papers, or for that matter any of the single-cell transcriptomics papers that have been written during the past few years would have been possible without the Seurat and Scanpy programs from the Satija and Theis labs respectively. More importantly, the atlases themselves are, fundamentally, a product of the computational tools used to analyze the data. For example, in the Tabula Microcebus the annotated cell types were obtained by analyzing the 10X Genomics single-cell RNA-seq data “through dimensionality reduction by principal component analysis, visualization in 2-dimensional projections with t-Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), and clustering by the Louvain method.” These methods are tuned with numerous parameters and even evaluating when they are accurate is challenging (Kiselev et al., 2019). In regards to dimensionality reduction, there are numerous problems that have been documented with the t-SNE and UMAP leading to questions about results based on interpreting them (see, e.g., Chari et al., 2021, Cooley et al., 2022). The only mention of computational technology in the article is a comment that “Similarly, the development of new algorithms and computational approaches was also a powerful enabler of the field as it now exists.” • The omission of computational technology is chalked up to space constraints, yet there was apparently enough space for the Quake to narrate an origin story of the Human Cell Atlas project in which he centers himself, instead of Aviv Regev and Sarah Teichmann whose contribution was much more than to have “asked whether various efforts should be merged into an international collaboration”. They were early champions of a collaborative human cell atlas project and have co-chaired the organizing committee from the outset. Teichmann co-founded the Wellcome Trust Sanger Institute Single Cell Genomics Centre in 2013, and by 2015 had been awarded the EMBO Gold Medal in part for her contributions to, and vision for, single-cell transcriptomics. Regev pioneered many of the single-cell RNA-seq technology developments that enabled single-cell genomics, including single-cell studies of immune cells in 2013 and spatial single-cell RNA-seq in 2015. By the time of the inaugural Human Cell Atlas meeting in London in 2016, Regev had been widely publicizing a vision for a “periodic table of cells” and Teichmann had joined forces with her to develop a joint vision to accomplish the task (see article in the Pacific Standard, 2018). • There are a few minor errors in the paper. Quake writes that “These [microfluidics automation technologies] were eventually commercialized by a company I founded called Fluidigm..”. In fact, Quake did not found Fluidigm by himself; the company was co-founded with Gajus Worthington. Miriam Merad’s name is incorrectly spelled as “Meriam” and Christophe Benoist’s name is incorrectly spelled as “Benoiste”. A recurring typo is the misspelling of Sarah Teichmann’s name. It is incorrectly spelled “Teichman” three times throughout the manuscript, including in the Acknowledgment section where she is thanked for specific comments on the manuscript. A broader point regarding cell atlases is that defining cell types, distinguishing cell types from cell states, and comprehensively organizing cells in any species in a meaningful framework, is a monumental task that we are only beginning to tackle. There are no definitive human or mouse cell atlases yet, and there won’t be for some time. Among the “atlases” published so far there is little consensus. The Tabula Muris, cell atlas annotates far fewer cell types than Han et al., 2018, perhaps because the latter assayed many more cells. Similarly, the fly cell atlas by Li et al., 2021 lists ~250 cell types in comparison to the Tabula Sapiens that finds ~400 in human. Perhaps these similar numbers do not reflect fundamental shared biology or a universal organizing principle for cells, but rather the fact that both projects sequenced similar numbers of cells (~580k vs. 500k respectively). Unsurprisingly, the number of annotated cell types in publications is strongly correlated with the number of cells assayed: The brain presents an especially daunting challenge. An entire recent issue of Nature was devoted to only one region: the primary motor cortex. Frankly, opinion pieces elbowing for priority claims are neither appropriate nor interesting. To the extent that the human cell atlas will ever become a meaningful accomplishment it will have been a project without a single winner. Instead, it will have been a collaborative effort of thousands of scientists from across the world who will have deepened our understanding of biology to the benefit of all. If you haven’t heard about Clubhouse yet… well, it’s the latest Silicon Valley unicorn, and the popular new chat hole for thought leaders. I heard about it for the first time a few months ago, and was kindly offered an invitation (Club house is invitation only!) so I could explore what it is all about. Clubhouse is an app for audio based social networking, and the content is, as far as I can tell, a mixed bag. I’ve listened to a handful of conversations hosted on the app.. topics include everything from bitcoin to Miami. It was interesting, at times, to hear the thoughts and opinions of some of the discussants. On the other hand, there is a lot of superficial rambling on Clubhouse as well. During a conversation about genetics I heard someone posit that biology has a lot to learn from the fashion industry. This was delivered in a “you are hearing something profound” manner, by someone who clearly knew nothing about either biology or the fashion industry, which is really too bad, because the fashion industry is quite interesting and I wouldn’t be surprised at all if biology has something to learn from it. Unfortunately, I never learned what that is. One of the regulars on Clubhouse is Noor Siddiqui. You may not have heard of her; in fact she is officially “not notable”. That is to say, she used to have a Wikipedia page but it was deleted on the grounds that there is nothing about her that indicates notability, which is of course notable in and of itself… a paradox that says more about Wikipedia’s gatekeeping than Siddiqui (Russell 1903, Litt 2021). In any case, Siddiqui was recently part of a Clubhouse conversation on “convergence of genomics and reproductive technology” together with Carlos Bustamante (advisor to cryptocurrency based Luna DNA and soon to be professor of business technology at the University of Miami) and Balaji Srinivasan (bitcoin angel investor and entrepreneur). As it happens, Siddiqui is the CEO of a startup called “Orchid Health“, in the genomics and reproductive technology “space”. The company promises to harness “population genetics, statistical modeling, reproductive technologies, and the latest advances in genomic science” to “give parents the option to lower a future child’s genetic risk by creating embryos through in IVF and implanting embryos in the order that can reduce disease risk.” This “product” will be available later this year. Bustamante and Srinivasan are early “operators and investors” in the venture. Orchid is not Siddiqui’s first startup. While she doesn’t have a Wikipedia page, she does have a website where she boasts of having (briefly) been a Thiel fellow and, together with her sister, starting a company as a teenager. The idea of the (briefly in existence) startup was apparently to help the now commercially defunct Google Glass gain acceptance by bringing the device to the medical industry. According to Siddiqui, Orchid is also not her first dive into statistical modeling or genomics. She notes on her website that she did “AI and genomics research”, specifically on “deep learning for genomics”. Such training and experience could have been put to good use but… Polygenic risk scores and polygenic embryo selection Orchid Health claims that it will “safely and naturally, protect your baby from diseases that run in your family” (the slogan “have healthy babies” is prominently displayed on the company’s website). The way it will do this is to utilize “advances in machine learning and artificial intelligence” to screen embryos created through in-vitro fertilization (IVF) for “breast cancer, prostate cancer, heart disease, atrial fibrillation, stroke, type 2 diabetes, type 1 diabetes, inflammatory bowel disease, schizophrenia and Alzheimer’s“. What this means in (a statistical geneticist’s) layman’s terms is that Orchid is planning to use polygenic risk scores derived from genome-wide association studies to perform polygenic embryo selection for complex diseases. This can be easily unpacked because it’s quite a simple proposition, although it’s far from a trivial one- the statistical genetics involved is deep and complicated. First, a single-gene disorder is a health problem that is caused by a single mutation in the genome. Examples of such disorders include Tay-Sachs disease, sickle cell anaemia, Huntington’s disease, Duchenne muscular dystrophy, and many other diseases. A “complex disease”, also called a multifactorial disease, is a disease that has a genetic component, but one that involves multiple genes, i.e. it is not a single-gene disorder. Crucially, complex diseases may involve effects of environmental factors, whose role in causing disease may depend on the genetic composition of an individual. The list of diseases on Orchid’s website, including breast cancer, prostate cancer, heart disease, atrial fibrillation, stroke, type 2 diabetes, type 1 diabetes, inflammatory bowel disease, schizophrenia and Alzheimer’s disease are all examples of complex (multifactorial) diseases. To identify genes that associate with a complex disease, researchers perform genome-wide association studies (GWAS). In such studies, researchers typically analyze several million genomic sites in a large numbers of individuals with and without a disease (used to be thousands of individuals, nowadays hundreds of thousands or millions) and perform regressions to assess the marginal effect at each locus. I italicized the word associate above, because genome-wide association studies do not, in and of themselves, point to genomic loci that cause disease. Rather, they produce, as output, lists of genomic loci that have varying degrees of association with the disease or trait of interest. Polygenic risk scores (PRS), which the Broad Institute claims to have discovered (narrator: they were not discovered at the Broad Institute), are a way to combine the multiple genetic loci associated with a complex disease from a GWAS. Specifically, a PRS $\hat{S}$ for a complex disease is given by $\hat{S} = \sum_{j=1}^m X_j \hat{\beta}_j,$ where the sum is over $m$ different genetic loci, the $X_j$ are coded genetic markers for an individual at the $m$ loci, and the $\hat{\beta}_j$ are weights based on the marginal effects derived from a GWAS. The concept of a PRS is straightforward, but the details are complicated, in some cases subtle, and generally non-trivial. There is debate over how many genomic loci should be used in computing a polygenic risk score given that the vast majority of marginal effects are very close to zero (Janssens 2019), lots of ongoing research about how to set the weights to account for issues such as bias caused by linkage disequilibrium (Vilhjálmsson et al. 2015, Shin et al. 2017, Newcombe et al. 2019, Ge et al. 2019, Lloyd-Jones et al. 2019, Pattee and Pan 2020, Song et al. 2020), and continuing discussions about the ethics of using polygenic risk scores in the clinic (Lewis and Green 2021). While much of the discussion around PRS applications centers on applications such as determining diagnostic testing frequency (Wald and Old 2019), polygenic embryo selection (PES) posits that polygenic risk scores should be taken a step further and evaluated for embryos to be used as a basis for discarding, or selecting, specific embryos for in vitro fertilization implantation. The idea has been widely criticized and critiqued (Karavani et al. 2019). It has been described as unethical, morally repugnant, and concerns about its use for eugenics have been voiced by many. Underlying these criticisms is the fact that the technical issues with PES using PRS are manifold. Poor penetrance The term “penetrance” for a disease refers to the proportion of individuals with a particular genetic variant that have the disease. Many single-gene disorders have very high penetrance. For example, F508del mutation in the CFTR gene is 100% penetrant for cystic fibrosis. That is, 100% of people who are homozygous for this variant, meaning that both copies of their DNA have a deletion of the phenylalanine amino acid in position 508 of their CFTR gene, will have cystic fibrosis. The vast majority of variants associated with complex diseases have very low penetrance. For example, in schizophrenia, the penetrance of “high risk” de novo copy number variants (in which there are variable copies of DNA at a genomic loci) was found to be between 2% and 7.4% (Vassos et al 2010). The low penetrance at large numbers of variants for complex diseases was precisely the rationale for developing polygenic risk scores in the first place, the idea being that while individual variants yield small effects, perhaps in (linear) combination they can have more predictive power. While it is true that combining variants does yield more predictive power for complex diseases, unfortunately the accuracy is, in absolute terms, very low. The reason for low predictive power of PRS is explained well in (Wald and Old 2020) and is illustrated for coronary artery disease (CAD) in (Rotter and Lin 2020): The issue is that while the polygenic risk score distribution may indeed be shifted for individuals with a disease, and while this shift may be statistically significant resulting in large odds ratios, i.e. much higher relative risk for individuals with higher PRS, the proportion of individuals in the tail of the distributions who will or won’t develop the disease will greatly affect the predictive power of the PRS. For example, Wald and Old note that PRS for CAD from (Khera et al. 2018) will confer a detection rate of only 15% with a false positive rate of 5%. At a 3% false positive rate the detection rate would be only 10%. This is visible in the figure above, where it is clear that control of the false positive right (i.e. thresholding at the extreme right-hand side with high PRS score) will filter out many (most) affected individuals. The same issue is raised in the excellent review on PES of (Lázaro-Muńoz et al. 2020). The authors explain that “even if a PRS in the top decile for schizophrenia conferred a nearly fivefold increased risk for a given embryo, this would still yield a >95% chance of not developing the disorder.” It is worth noting in this context, that diseases like schizophrenia are not even well defined phenotypically (Mølstrøm et al. 2020), which is another complex matter that is too involved to go into detail here. In a recent tweet, Siddiqui describes natural conception as a genetic lottery, and suggests that Orchid Health, by performing PES, can tilt the odds in customers’ favor. To do so the false positive rate must be low, or else too many embryos will be discarded. But a 15% sensitivity is highly problematic considering the risks inherent with IVF in the first place (Kamphuis et al. 2014): To be concrete, an odds ratio of 2.8 for cerebral palsy needs to be balanced against the fact that in the Khera et al. study, only 8% of individuals had an odds ratio >3.0 for CAD. Other diseases are even worse, in this sense, than CAD. In atrial fibrillation (one of the diseases on Orchid Health’s list), only 9.3% of the individuals in the top 0.44% of the atrial fibrillation PRS actually had atrial fibrillation (Choi et al 2019).As one starts to think carefully about the practical aspects and tradeoffs in performing PES, other issues, resulting from the low penetrance of complex disease variants, come into play as well. (Lencz et al. 2020) examine these tradeoffs in detail, and conclude that “the differential performance of PES across selection strategies and risk reduction metrics may be difficult to communicate to couples seeking assisted reproductive technologies… These difficulties are expected to exacerbate the already profound ethical issues raised by PES… which include stigmatization, autonomy (including “choice overload”, and equity. In addition, the ever-present specter of eugenics may be especially salient in the context of the LRP (lowest-risk prioritization) strategy.” They go on to “call for urgent deliberations amongst key stakeholders (including researchers, clinicians, and patients) to address governance of PES and for the development of policy statements by professional societies.” Pleiotropy predicaments I remember a conversation I had with Nicolas Bray several years ago shortly after the exciting discovery of CRISPR/Cas9 for genome editing, on the implications of the technology for improving human health. Nick pointed out that the development of genomics had been curiously “backwards”. Thirty years ago, when human genome sequencing was beginning in earnest, the hope was that with the sequence at hand we would be able to start figuring out the function of genes, and even individual base pairs in the genome. At the time, the human genome project was billed as being able to “help scientists search for genes associated with human disease” and it was imagined that “greater understanding of the genetic errors that cause disease should pave the way for new strategies in diagnosis, therapy, and disease prevention.” Instead, what happened is that genome editing technology has arrived well before we have any idea of what the vast majority of the genome does, let alone the implications of edits to it. Similarly, while the coupling of IVF and genome sequencing makes it possible to select embryos based on genetic variants today, the reality is that we have no idea how the genome functions, or what the vast majority of genes or variants actually do. One thing that is known about the genome is that it is chock full of pleiotropy. This is statistical genetics jargon for the fact that variation at a single locus in the genome can affect many traits simultaneously. Whereas one might think naïvely that there are distinct genes affecting individual traits, in reality the genome is a complex web of interactions among its constituent parts, leading to extensive pleiotropy. In some cases pleiotropy can be antagonistic, which means that a genomic variant may simultaneously be harmful and beneficial. A famous example of this is the mutation to the beta globin gene that confers malaria resistance to heterozygotes (individuals with just one of their DNA copies carrying the mutation) and sickle cell anemia to homozygotes (individuals with both copies of their DNA carrying the mutation). In the case of complex diseases we don’t really know enough, or anything, about the genome to be able to truly assess pleiotropy risks (or benefits). But there are some worries already. For example, HLA Class II genes are associated with Type I and non-insulin treated Type 2 diabetes (Jacobi et al 2020), Parkinson’s disease (e.g. James and Georgopolous 2020, which also describes an association with dementia) and Alzheimer’s (Wang and Xing 2020). PES that results in selection against the variants associated with these diseases could very well lead to population susceptibility to infectious disease. Having said that, it is worth repeating that we don’t really know if the danger is serious, because we don’t have any idea what the vast majority of the genome does, nor the nature of antagonistic pleiotropy present in it. Almost certainly by selecting for one trait according to PRS, embryos will also be selected for a host of other unknown traits. Thus, what can be said is that while Orchid Health is trying to convince potential customers to not “roll the dice“, by ignoring the complexities of pleiotropy and its implications for embryo selection, what the company is actually doing is in fact rolling the dice for its customers (for a fee). Population problems One of Orchid Health’s selling points is that unlike other tests that “look at 2% of only one partner’s genome…Orchid sequences 100% of both partner’s genomes” resulting in “6 billion data points”. This refers to the “couples report”, which is a companion product of sorts to the polygenic embryo screening. The couples report is assembled by using the sequenced genomes of parents to simulate the genomes of potential babies, each of which is evaluated for PRS’ to provide a range of (PRS based) disease predictions for the couples potential children. Sequencing a whole genome is a lot more expensive that just assessing single nucleotide polymorphisms (SNPs) in a panel. That may be one reason that most direct-to-consumer genetics is based on polymorphism panels rather than sequencing. There is another: the vast majority of variation in the genome occurs at a known polymorphic sites (there are a few million out of the approximately 3 billion base pairs in the genome), and to the extent that a variant might associate with a disease, it is likely that a neighboring common variant, which will be inherited together with the causal one, can serve as a proxy. There are rare variants that have been shown to associate with disease, but whether or not they explain can explain a large fraction of (genetic) disease burden is still an open question (Young 2019). So what has Siddiqui, who touts the benefits of whole-genome sequencing in a recent interview, discovered that others such as 23andme have missed? It turns out there is value to whole-genome sequencing for polygenic risk score analysis, but it is when one is performing the genome-wide association studies on which the PRS are based. The reason is a bit subtle, and has to do with differences in genetics between populations. Specifically, as explained in (De La Vega and Bustamante, 2018), variants that associate with a disease in one population may be different than variants that associate with the disease in another population, and whole-genome sequencing across populations can help to mitigate biases that result when restricting to SNP panels. Unfortunately, as De La Vega and Bustamante note, whole-genome sequencing for GWAS “would increase costs by orders of magnitude”. In any case, the value of whole-genome sequencing for PRS lies mainly in identifying relevant variants, not in assessing risk in individuals. The issue of population structure affecting PRS unfortunately transcends considerations about whole-genome sequencing. (Curtis 2018) shows that PRS for schizophrenia is more strongly associated with ancestry than with the disease. Specifically, he shows that “The PRS for schizophrenia varied significantly between ancestral groups and was much higher in African than European HapMap subjects. The mean difference between these groups was 10 times as high as the mean difference between European schizophrenia cases and controls. The distributions of scores for African and European subjects hardly overlapped.” The figure from Curtis’ paper showing the distribution of PRS for schizophrenia across populations is displayed below (the three letter codes at the bottom are abbreviations for different population groups; CEU stands for Northern Europeans from Utah and is the lowest). The dependence of PRS on population is a problem that is compounded by a general problem with GWAS, namely that Europeans and individuals of European descent have been significantly oversampled in GWAS. Furthermore, even within a single ancestry group, the prediction accuracy of PRS can depend on confounding factors such as socio-economic status (Mostafavi et al. 2020). Practically speaking, the implications for PES are beyond troubling. The PRS scores in the reports customers of Orchid Health may be inaccurate or meaningless due to not only the genetic background or admixture of the parents involved, but also other unaccounted for factors. Embryo selection on the basis of such data becomes worse than just throwing dice, it can potentially lead to unintended consequences in the genomes of the selected embryos. (Martin et al. 2019) show unequivocally that clinical use of polygenic risk scores may exacerbate health disparities. People pathos The fact that Silicon Valley entrepreneurs are jumping aboard a technically incoherent venture and are willing to set aside serious ethical and moral concerns is not very surprising. See, e.g. Theranos, which was supported by its investors despite concerns being raised about the technical foundations of the company. After a critical story appeared in the Wall Street Journal, the company put out a statement that “[Bad stories]…come along when you threaten to change things, seeded by entrenched interests that will do anything to prevent change, but in the end nothing will deter us from making our tests the best and of the highest integrity for the people we serve, and continuing to fight for transformative change in health care.” While this did bother a few investors at the time, many stayed the course for a while longer. Siddiqui uses similar language, brushing off criticism by complaining about paternalism in the health care industry and gatekeeping, while stating that “We’re in an age of seismic change in biotech – the ability to sequence genomes, the ability to edit genomes, and now the unprecedented ability to impact the health of a future child.” Her investors, many of whom got rich from cryptocurrency trading or bitcoin, cheer her on. One of her investors is Brian Armstrong, CEO of Coinbase, who believes “[Orchid is] a step towards where we need to go in medicine.” I think I can understand some of the ego and money incentives of Silicon Valley that drive such sentiment. But one thing that disappoints me is that scientists I personally held in high regard, such as Jan Liphardt (associate professor of Bioengineering at Stanford) who is on the scientific advisory board and Carlos Bustamante (co-author of the paper about population structure associated biases in PRS mentioned above) who is an investor in Orchid Health, have associated themselves with the company. It’s also very disturbing that Anne Wojcicki, the CEO of 23andme whose team of statistical geneticists understand the subtleties of PRS, still went ahead and invested in the company. Conclusion Orchid Health’s polygenic embryo selection, which it will be offering later this year, is unethical and morally repugnant. My suggestion is to think twice before sending them three years of tax returns to try to get a discount on their product. Steven Miller is a math professor at Williams College who specializes in number theory and theoretical probability theory. A few days ago he published a “declaration” in which he performs an “analysis” of phone bank data of registered Republicans in Pennsylvania. The data was provided to him by Matt Braynyard, who led Trump’s data team during the 2016. Miller frames his “analysis” as an attempt to “estimate the number of fraudulent ballots in Pennsylvania”, and his analysis of the data leads him to conclude that “almost surely…the number of ballots requested by someone other than the registered Republican is between 37,001 and 58,914, and almost surely the number of ballots requested by registered Republicans and returned but not counted is in the range from 38,910 to 56,483.” A review of Miller’s “analysis” leads me to conclude that his estimates are fundamentally flawed and that the data as presented provide no evidence of voter fraud. This conclusion is easy to arrive at. The declaration claims (without a reference) that there were 165,412 mail-in ballots requested by registered Republicans in PA, but that “had not arrived to be counted” as of November 16th, 2020. The data Miller analyzed was based on an attempt to call some of these registered Republicans by phone to assess what happened to their ballots. The number of phone calls made, according to the declaration, is 23,184 = 17,000 + 3,500 + 2,684. The number 17,000 consists of phone calls that did not provide information either because an answering machine picked up instead of a person, or a person picked up and summarily hung up. 3,500 numbers were characterized as “bad numbers / language barrier”, and 2,684 individuals answered the phone. Curiously, Miller writes that “Almost 20,000 people were called”, when in fact 23,184 > 20,000. In any case, clearly many of the phone numbers dialed were simply wrong numbers, as evident by the number of “bad” calls: 3,500. It’s easy to imagine how this can happen: confusion because some individuals share a name, phone numbers have changed, people move, the phone call bank makes an error when dialing etc. Let $b$ be the fraction of phone numbers out of the 23,184 that were “bad”, i.e. incorrect. We can estimate $b$ by noting that we have some information about it: we know that the 3,500 “bad numbers” were bad (by definition). Additionally, it is reported in the declaration that 556 people literally said that they did not request a ballot, and there is no reason not to take them at their word. We don’t know what fraction of the 17,000 individuals called and did not pick up or hung up were wrong numbers, but we do know that the fraction out of the total must equal the fraction out of the 17,000 + those we know for sure were bad numbers, i.e. $23184 \cdot b = 17,000 \cdot b + 556 + 3500$. Solving for $b$ we find that $b \approx \frac{2}{3}$. I’m surprised the number is so low. One would expect that individuals who requested ballots, but then didn’t send them in, would be enriched for people who have recently moved or are in the process of moving, or have other issues making it difficult to reach them or impossible to reach them at all. The fraction of bad calls derived translates to about 1,700 bad numbers out of the 2,684 people that were reached. This easily explains not only the 556 individuals who said they did not request a ballot, but also the 463 individuals who said that they mailed back their ballots. In the case of the latter there is no irregularity; the number of bad calls suggests that all those individuals were reached in error and their ballots were legitimately counted so they weren’t part of the 165,412. It also explains the 544 individuals who said they voted in person. That’s it. The data don’t point to any fraud or irregularity, just a poorly design poll with poor response rates and lots of erroneous information due to bad phone numbers. There is nothing to explain. Miller, on the other hand, has some things to explain. First, I note that his declaration begins with a signed page asserting various facts about Steven Miller and the analysis he performed. Notably absent from the page, or anywhere else in the document, is a disclosure of funding source for the work and of conflicts of interest. On his work webpage, Miller specifically states that one should always acknowledge funding support. Second, if Miller really wanted to understand the reason why some ballots were requested for mail-in, but had not yet arrived to be counted, he would also obtain data from Democrats. That would provide a control on various aspects of the analysis, and help to establish whether irregularities, if they were to be detected, were of a partisan nature. Why did Miller not include an analysis of such data? Third, one might wonder why Steven Miller chose to publish this “declaration”. Surely a professor who has taught probability and statistics for 15 years (as Miller claims he has) must understand that his own “analysis” is fundamentally flawed, right? Then again, I’ve previously found that excellent pure mathematicians are prone to falling into a data analysis trap, i.e. a situation where their lack of experience analyzing real-world datasets leads them to believe naïve analysis that is deeply flawed. To better understand whether this might be the case with Miller, I examined his publication record, which he has shared publicly via Google Scholar, to see whether he has worked with data. The first thing I noticed was that he has published more than 700 articles (!) and has an h-index of 47 for a total of 8,634 citations… an incredible record for any professor, and especially for a mathematician. A Google search for his name displays this impressive number of citations: As it turns out, his impressive publication record is a mirage. When I took a closer look and found that many of the papers he lists on his Google Scholar page are not his, but rather articles published by other authors with the name S Miller. “His” most cited article was published in 1955, a year that transpired well before he was born. Miller’s own most cited paper is a short unpublished tutorial on least squares (I was curious and reviewed it as well only to find some inaccuracies but hey, I don’t work for this guy). I will note that in creating his Google Scholar page, Miller did not just enter his name and email address (required). He went to the effort of customizing the page, including the addition of keywords and a link to his homepage, and in doing so followed his own general advice to curate one’s CV (strangely, he also dispenses advice on job interviews, including about shaving- I guess only women interview for jobs?). But I digress: the question is, why is his Google Scholar page display massively inflated publication statistics based on papers that are not his? I’ve seen this before, and in one case where I had hard evidence that it was done deliberately to mislead I reported it as fraud. Regardless of Miller’s motivations, by looking at his actual publications I confirmed what I suspected, namely that he has hardly any experience analyzing real world data. I’m willing to chalk up his embarrassing “declaration” to statistics illiteracy and naïveté. In summary, Steven Miller’s declaration provides no evidence whatsoever of voter fraud in Pennsylvania. Lior Pachter Division of Biology and Biological Engineering & Department of Computing and Mathematical Sciences California Institute of Technology Abstract A recently published pilot study on the efficacy of 25-hydroxyvitamin D3 (calcifediol) in reducing ICU admission of hospitalized COVID-19 patients, concluded that the treatment “seems able to reduce the severity of disease, but larger trials with groups properly matched will be required go show a definitive answer”. In a follow-up paper, Jungreis and Kellis re-examine this so-called “Córdoba study” and argue that the authors of the study have undersold their results. Based on a reanalysis of the data in a manner they describe as “rigorous” and using “well established statistical techniques”, they urge the medical community to “consider testing the vitamin D levels of all hospitalized COVID-19 patients, and taking remedial action for those who are deficient.” Their recommendation is based on two claims: in an examination of unevenness in the distribution of one of the comorbidities between cases and controls, they conclude that there is “no evidence of incorrect randomization”, and they present a “mathematical theorem” to make the case that the effect size in the Córdoba study is significant to the extent that “they can be confident that if assignment to the treatment group had no effect, we would not have observed these results simply due to chance.” Unfortunately, the “mathematical analysis” of Jungreis and Kellis is deeply flawed, and their “theorem” is vacuous. Their analysis cannot be used to conclude that the Córdoba study shows that calcifediol significantly reduces ICU admission of hospitalized COVID- 19 patients. Moreover, the Córdoba study is fundamentally flawed, and therefore there is nothing to learn from it. The Córdoba study The Córdoba study, described by the authors as a pilot, was ostensibly a randomized controlled trial, designed to determine the efficacy of 25-hydroxyvitamin D3 in reducing ICU admission of hospitalized COVID-19 patients. The study consisted of 76 patients hospitalized for COVID-19 symptoms, with 50 of the patients treated with calcifediol, and 26 not receiving treatment. Patients were administered “standard care”, which according to the authors consisted of “a combination of hydroxychloroquine, azithromycin, and for patients with pneumonia and NEWS score 5, a broad spectrum antibiotic”. Crucially, admission to the ICU was determined by a “Selection Committee” consisting of intensivists, pulmonologists, internists, and members of an ethics committee. The Selection Committee based ICU admission decisions on the evaluation of several criteria, including presence of comorbidities, and the level of dependence of patients according to their needs and clinical criteria. The result of the Córdoba trial was that only 1/50 of the treated patients was admitted to the ICU, whereas 13/26 of the untreated patients were admitted (p-value = 7.7 ∗ 10−7 by Fisher’s exact test). This is a minuscule p-value but it is meaningless. Since there is no record of the Selection Committee deliberations, it impossible to know whether the ICU admission of the 13 untreated patients was due to their previous high blood pressure comorbidity. Perhaps the 11 treated patients with the comorbidity were not admitted to the ICU because they were older, and the Selection Committee considered their previous higher blood pressure to be more “normal” (14/50 treatment patients were over the age of 60, versus only 5/26 of the untreated patients). Figure 1: Table 2 from [1] showing the comorbidities of patients. It is reproduced by virtue of [1] being published open access under the CC-BY license. The fact that admission to the ICU could be decided in part based on the presence of co-morbidities, and that there was a significant imbalance in one of the comorbidities, immediately renders the study results meaningless. There are several other problems with it that potentially confound the results: the study did not examine the Vitamin D levels of the treated patients, nor was the untreated group administered a placebo. Most importantly, the study numbers were tiny, with only 76 patients examined. Small studies are notoriously problematic, and are known to produce large effect sizes [9]. Furthermore, sloppiness in the study does not lead to confidence in the results. The authors state that the “rigorous protocol” for determining patient admission to the ICU is available as Supplementary Material, but there is no Supplementary Material distributed with the paper. There is also an embarrassing typo: Fisher’s exact test is referred to twice as “Fischer’s test”. To err once in describing this classical statistical test may be regarded as misfortune; to do it twice looks like carelessness. A pointless statistics exercise The Córdoba study has not received much attention, which is not surprising considering that by the authors’ own admission it was a pilot that at best only motivates a properly matched and powered randomized controlled trial. Indeed, the authors mention that such a trial (the COVIDIOL trial), with data being collected from 15 hospitals in Spain, is underway. Nevertheless, Jungreis and Kellis [3], apparently mesmerized by the 7.7 ∗ 10−7 p-value for ICU admission upon treatment, felt the need to “rescue” the study with what amounts to faux statistical gravitas. They argue for immediate consideration of testing Vitamin D levels of hospitalized patients, so that “deficient” patients can be administered some form of Vitamin D “to the extent it can be done safely”. Their message has been noticed; only a few days after [3] appeared the authors’ tweet to promote it has been retweeted more than 50 times [8]. Jungreis and Kellis claim that the p-value for the effect of calcifediol on patients is so significant, that in and of itself it merits belief that administration of calcifediol does, in fact, prevent admission of patients to ICUs. To make their case, Jungreis and Kellis begin by acknowledging that imbalance between the treated and untreated groups in the previous high blood pressure comorbidity may be a problem, but claim that there is “no evidence of incorrect randomization.” Their argument is as follows: they note that while the p-value for the imbalance in the previous high blood pressure comorbidity is 0.0023, it should be adjusted for the fact that there are 15 distinct comorbidities, and that just by chance, when computing so many p-values, one might be small. First, an examination of Table 2 in [1] (Figure 1) shows that there were only 14 comorbidities assessed, as none of the patients had previous chronic kidney disease. Thus, the number 15 is incorrect. Second, Jungreis and Kellis argue that a Bonferroni correction should be applied, and that this correction should be based on 30 tests (=15 × 2). The reason for the factor of 2 is that they claim that when testing for imbalance, one should test for imbalance in both directions. By applying the Bonferroni correction to the p-values, they derive a “corrected” p-value for previous high blood pressure being imbalanced between groups of 0.069. They are wrong on several counts in deriving this number. To illustrate the problems we work through the calculation step-by-step: The question we want to answer is as follows: given that there are multiple comorbidities, is there is a significant imbalance in at least one comorbidity. There are several ways to test for this, with the simplest being Šidák’s correction [10] given by $q \quad = \quad 1-(1-m)^n,$ where m is the minimum p-value among the comorbidities, and n is the number of tests. Plugging in m = 0.0023 (the smallest p-value in Table 2 of [1]) and n = 14 (the number of comorbidities) one gets 0.032 (note that the Bonferroni correction used by Jungreis And Kellis is the Taylor approximation to the Šidák correction when m is small). The Šidák correction is based on an assumption that the tests are independent. However, that is certainly not the case in the Córdoba study. For example, having at least one prognostic factor is one of the comorbidities tabulated. In other words, the p-value obtained is conservative. The calculation above uses n = 14, but Jungreis and Kellis reason that the number of tests is 30 = 15 × 2, to take into account an imbalance in either the treated or untreated direction. Here they are assuming two things: that two-sided tests for each comorbidity will produce double the p-value of a one-sided test, and that two sided tests are the “correct” tests to perform. They are wrong on both counts. First, the two-sided Fisher exact test does not, in general produce a p-value that is double the 1-sided test. The study result is a good example: 1/49 treated patients admitted to the ICU vs. 13/26 untreated patients produces a p-value of 7.7 ∗ 10−7 for both the 1-sided and 2-sided tests. Jungreis and Kellis do not seem to know this can happen, nor understand why; they go to great lengths to explain the importance of conducting a 1-sided test for the study result. Second, there is a strong case to be made that a 1-sided test is the correct test to perform for the comorbidities. The concern is not whether there was an imbalance of any sort, but whether the imbalance would skew results by virtue of the study including too many untreated individuals with comorbidities. In any case, if one were to give Jungreis and Kellis the benefit of the doubt, and perform a two sided test, the corrected p-value for the previous high blood pressure comorbidity is 0.06 and not 0.069. The most serious mistake that Jungreis and Kellis make, however, is in claiming that one can accept the null hypothesis of a hypothesis test when the p-value is greater than 0.05. The p-value they obtain is 0.069 which, even if it is taken at face value, is not grounds for claiming, as Jungreis and Kellis do, that “this is not significant evidence that the assignment was not random” and reason to conclude that there is “no evidence of incorrect randomization”. That is not how p-values work. A p-value less than 0.05 allows one to reject the null hypothesis (assuming 0.05 is the threshold chosen), but a p-value above the chosen threshold is not grounds for accepting the null. Moreover, the corrected p-value is 0.032 which is certainly grounds for rejecting the null hypothesis that the randomization was random. Correction of the incorrect Jungreis and Kellis statistics may be a productive exercise in introductory undergraduate statistics for some, but it is pointless insofar as assessing the Córdoba study. While the extreme imbalance in the previous high blood pressure comorbidity is problematic because patients with the comorbidity may be more likely to get sick and require ICU admission, the study was so flawed that the exact p-value for the imbalance is a moot point. Given that the presence of comorbidities, not just their effect on patients, was a factor in determining which patients were admitted to the ICU, the extreme imbalance in the previous high blood pressure comorbidity renders the result of the study meaningless ex facie. A definition is not a theorem is not proof of efficacy In an effort to fend off criticism that the comorbidities of patients were improperly balanced in the study, Jungreis and Kellis go further and present a “theorem” they claim shows that there was a minuscule chance that an uneven distribution of comorbidities could render the study results not significant. The “theorem” is stated twice in their paper, and I’ve copied both theorem statements verbatim from their paper: Theorem 1 In a randomized study, let p be the p-value of the study results, and let q be the probability that the randomization assigns patients to the control group in such a way that the values of Pprognostic(Patient) are sufficiently unevenly distributed between the treatment and control groups that the result of the study would no longer be statistically significant at the 95% level after p controlling for the prognostic risk factors. Then $q < \frac{p}{0.05}$. According to Jungreis and Kellis, Pprognostic(Patient) is the following: “There can be any number of prognostic risk factors, but if we knew what all of them were, and their effect sizes, and the interactions among them, we could combine their effects into a single number for each patient, which is the probability, based on all known and yet-to-be discovered risk factors at the time of hospital admission, that the patient will require ICU care if not given the calcifediol treatment. Call this (unknown) probability Pprognostic(Patient).” The theorem is restated in the Methods section of Jungreis and Kellis paper as follows: Theorem 2 In a randomized controlled study, let p be the p-value of the study outcome, and let q be the probability that the randomization distributes all prognostic risk factors combined sufficiently unevenly between the treatment and control groups that when controlling for these prognostic risk p factors the outcome would no longer be statistically significant at the 95% level. Then $q < \frac{p}{0.05}$. While it is difficult to decipher the language the “theorem” is written in, let alone its meaning (note Theorem 1 and Theorem 2 are supposedly the same theorem), I was able to glean something about its content from reading the “proof”. The mathematical content of whatever the theorem is supposed to mean, is the definition of conditional probability, namely that if A and B are events with $P(B) > 0$, then $P(A|B) \quad := \quad \frac{P(A \cap B)}{P(B)}$. To be fair to Jungreis and Kellis, the “theorem” includes the observation that $P(A \cap B) \leq P(A) \quad \Rightarrow \quad P(A|B) \leq \frac{P(A)}{P(B)}.$ This is not, by any stretch of the imagination, a “theorem”; it is literally the definition of conditional probability followed by an elementary inequality. The most generous interpretation of what Jungreis and Kellis were trying to do with this “theorem”, is that they were showing that the p-value for the study is so small, that it is small even after being multiplied by 20. There are less generous interpretations. Does Vitamin D intake reduce ICU admission? There has been a lot of interest in Vitamin D and its effects on human health over the past decade [2], and much speculation about its relevance for COVID-19 susceptibility and disease severity. One interesting result on disease susceptibility was published recently: in a study of 489 patients, it was found that the relative risk of testing positive for COVID-19 was 1.77 times greater for patients with likely deficient vitamin D status compared with patients with likely sufficient vitamin D status [7]. However, definitive results on Vitamin D and its relationship to COVID- 19 will have to await larger trials. One such trial, a large randomized clinical trial with 2,700 individuals sponsored by Brigham and Women’s Hospital, is currently underway [4]. While this study might shed some light on Vitamin D and COVID-19, it is prudent to keep in mind that the outcome is not certain. Vitamin D levels are confounded with many socioeconomic factors, making the identification of causal links difficult. In the meantime, it has been suggested that it makes sense for individuals to maintain reference nutrient intakes of Vitamin D [6]. Such a public health recommendation is not controversial. As for Vitamin D administration to hospitalized COVID-19 patients reducing ICU admission, the best one can say about the Córdoba study is that nothing can be learned from it. Unfortunately, the poor study design, small sample size, availability of only summary statistics for the comorbidities, and imbalanced comorbidities among treated and untreated patients render the data useless. While it may be true that calcifediol administration to hospital patients reduces subsequent ICU admission, it may also not be true. Thus, the follow-up by Jungreis and Kellis is pointless at best. At worst, it is irresponsible propaganda, advocating for potentially dangerous treatment on the basis of shoddy arguments masked as “rigorous and well established statistical techniques”. It is surprising to see Jungreis and Kellis argue that it may be unethical to conduct a placebo randomized controlled trial, which is one of the most powerful tools in the development of safe and effective medical treatments. They write “the ethics of giving a placebo rather than treatment to a vitamin D deficient patient with this potentially fatal disease would need to be evaluated.” The evidence for such a policy is currently non-existent. On the other hand, there are plenty of known risks associated with excess Vitamin D [5]. References 1. Marta Entrenas Castillo, Luis Manuel Entrenas Costa, José Manuel Vaquero Barrios, Juan Francisco Alcalá Díaz, José López Miranda, Roger Bouillon, and José Manuel Quesada Gomez. Effect of calcifediol treatment and best available therapy versus best available therapy on intensive care unit admission and mortality among patients hospitalized for COVID-19: A pilot randomized clinical study. The Journal of steroid biochemistry and molecular biology, 203:105751, 2020. 2. Michael F Holick. Vitamin D deficiency. New England Journal of Medicine, 357(3):266–281, 2007. 3. Irwin Jungreis and Manolis Kellis. Mathematical analysis of Córdoba calcifediol trial suggests strong role for Vitamin D in reducing ICU admissions of hospitalized COVID-19 patients. medRxiv, 2020. 4. JoAnn E Manson. https://clinicaltrials.gov/ct2/show/nct04536298. 5. Ewa Marcinowska-Suchowierska, Małgorzata Kupisz-Urbańska, Jacek Łukaszkiewicz, Paweł Płudowski, and Glenville Jones. Vitamin D toxicity–a clinical perspective. Frontiers in endocrinology, 9:550, 2018 6. Adrian R Martineau and Nita G Forouhi. Vitamin D for COVID-19: a case to answer? The Lancet Diabetes & Endocrinology, 8(9):735–736, 2020. 7. David O Meltzer, Thomas J Best, Hui Zhang, Tamara Vokes, Vineet Arora, and Julian Solway. Association of vitamin D status and other clinical characteristics with COVID-19 test results. JAMA network open, 3(9):e2019722–e2019722, 2020. 8. Vivien Shotwell. https://tweetstamp.org/1327281999137091586. 9. Robert Slavin and Dewi Smith. The relationship between sample sizes and effect sizes in systematic reviews in education. Educational evaluation and policy analysis, 31(4):500–506, 2009. 10. Lynn Yi, Harold Pimentel, Nicolas L Bray, and Lior Pachter. Gene-level differential analysis at transcript-level resolution. Genome biology, 19(1):53, 2018. “It is not easy when people start listening to all the nonsense you talk. Suddenly, there are many more opportunities and enticements than one can ever manage.” – Michael Levitt, Nobel Prize in Chemistry, 2013 In 1990 Glendon MacGregor, a restaurant waiter in Pretoria, South Africa, set up an elaborate hoax in which he posed as the crown prince of Liechtenstein to organize for himself a state visit to his own country. Amazingly, the ruse lasted for two weeks, and during that time MacGregor was wined and dined by numerous South African dignitaries. He had a blast in his home town, living it up in a posh hotel, and enjoying a trip to see the Blue Bulls in Loftus Versfeld stadium. The story is the subject of the 1993 Afrikaans film “Die Prins van Pretoria” (The Prince of Pretoria). Now, another Pretorian is at it, except this time not for two weeks but for several months. And, unlike MacGregor’s hoax, this one does not just embarrass a government and leave it with a handful of hotel and restaurant bills. This hoax risks lives. Michael Levitt, a Stanford University Professor of structural biology and winner of the Nobel Prize in Chemistry in 2013, wants you to believe the COVID-19 pandemic is over in the US. He claimed it ended on August 22nd, with a total of 170,000 deaths (there are now over 200,000 with hundreds of deaths per day). He claims those 170,000 deaths weren’t even COVID-19 deaths, and since the virus is not very dangerous, he suggests you infect yourself. How? He proposes you set sail on a COVID-19 cruise. Levitt’s lunacy began with an attempt to save the world from epidemiologists. Levitt presumably figured this would not be a difficult undertaking, because, he has noted, epidemiologists see their job not as getting things correct“. I guess he figured that he could do better than that. On February 25th of this year, at a time when there had already been 2,663 deaths due to the SARS-CoV-2 virus in China but before the World Health Organization had declared the COVID-19 outbreak a pandemic, he delivered what sounded like good news. He predicted that the virus had almost run its course, and that the final death toll in China would be 3,250. This turned out to be a somewhat optimistic prediction. As of the writing of this post (September 21, 2020), there have been 4,634 reported COVID-19 deaths in China, and there is reason to believe that the actual number of deaths has been far higher (see, e.g. He et al., 2020, Tsang et al., 2020, Wadham and Jacobs, 2020). Instead of publishing his methods or waiting to evaluate the veracity of his claims, Levitt signed up for multiple media interviews. Emboldened by “interest in his work” (who doesn’t want to interview a Nobel laureate?), he started making more predictions of the form “COVID-19 is not a threat and the pandemic is over”. On March 20th he said that “he will be surprised if the number of deaths in Israel surpasses 10“. Unfortunately, there have been 1,256 COVID-19 deaths in Israel so far with a massive increase in cases over the past few weeks and no end to the pandemic in sight. On March 28th, when Switzerland had 197 deaths, he predicted the pandemic was almost over and would end with 250. Switzerland are now seen 1,762 deaths and a recent dramatic increase in cases has overwhelmed hospitals in some regions leading to new lockdown measures. Levitt’s predictions have come loose and fast. On June 28th he predicted deaths in Brazil would plateau at 98,000. There have been over 137,000 deaths in Brazil with hundreds of people dying every day now. In Italy he predicted on March 28th that the pandemic was past its midpoint and deaths would end at 17,000 – 20,000. There have now been 35,707 deaths in Italy. The way he described the situation in the country at the time, when crematoria were overwhelmed, was “normal”. I became aware of Levitt’s predictions via an email list of the Fellows of the International Society of Computational Biology on March 14th. I’ve been a Fellow for 3 years, and during this time I’ve received hardly any mail, except during Fellow nomination season. It was therefore somewhat of a surprise to start receiving emails from Michael Levitt regarding COVID-19, but it was a time when scientists were scrambling to figure out how they could help with the pandemic and I was excited at the prospect of all of us learning from each other and possibly helping out. Levitt began by sending around a PDF via a Dropbox link and asked for feedback. I wrote back right away suggesting he distribute the code used to make the figures, make clear the exact versions of data he was scraping to get the results (with dates and copies so the work could be replicated), suggested he add references and noted there were several typos (e.g. the formula $D_n = C_nP_0 + C_{n-1}P_1 + C_{n-2}P_2 + \ldots + C_{n-29}P_2$ clearly had wrong indices). I asked that he post it on the bioRxiv so it could receive community feedback, and suggested he fill in some details so I and others could better evaluate the methods (e.g. I pointed out that I thought the use of a Gaussian for $P_n$ was problematic). The initial correspondence rapidly turned into a flurry of email on the ISCB Fellows list. Levitt was full of advice. He suggested everyone wear a mask and I and others pushed back noting, as Dr. Anthony Fauci did at the time, that there was a severe shortage of masks and they should go to doctors first. Several exchanges centered on who to blame for the pandemic (one Fellow suggested immigrants in Italy). Among all of this, there was one constant: Levitt’s COVID-19 advice and predictions kept on coming, and without reflection or response to the well-meaning critiques. After Levitt said he’d be surprised if there were more than 10 deaths in Israel, and after he refused to send code reproducing his analyses, or post a preprint, I urged my fellow Fellows in ISCB to release a statement distancing our organization from his opinions, and emphasizing the need for rigorous, reproducible work. I was admonished by two colleagues and told, in so many words, to shut up. Meanwhile, Levitt did not shut up. In March, after talking to Israeli newspapers about how he would be surprised if there were more than 10 deaths, he spoke directly to Israeli Prime Minister Benjamin Netanyahu to deliver his message that Israel was overreacting to the virus (he tried to speak to US president Donald Trump as well). Israel is now in a very dangerous situation with COVID-19 out of control. It has the highest number of cases per capita in the world. Did Levitt play a role in this by helping to convince Netanyahu to ease restrictions in the country in May? We may never know. There were likely many factors contributing to Israel’s current tragedy but Levitt, by virtue of speaking directly to Netanyahu, should be scrutinized for his actions. What we do know is that at the time, he was making predictions about the nature and expected course of the virus with unpublished methods (i.e. not even preprinted), poorly documented data, and without any possibility for anyone to reproduce any of his work. His disgraceful scholarship has not improved in the subsequent months. He did, eventually post a preprint, but the data tab states “all data to be made available” and there is the following paragraph relating to availability of code: We would like to make the computer codes we use available to all but these are currently written in a variety of languages that few would want to use. While Dr. Scaiewicz uses clean self-documenting Jupyter Python notebook code, Dr. Levitt still develops in a FORTRAN dialect call Mortran (Mortran 1975) that he has used since 1980. The Mortran preprocessor produces Fortran that is then converted to C-code using f2c. This code is at least a hundred-fold faster than Python code. His other favorite language is more modern, but involves the use of the now deprecated language Perl and Unix shell scripts. Nevertheless, the methods proposed here are simple; they are easily and quickly implemented by a skilled programmer. Should there be interest, we would be happy to help others develop the code and test them against ours. We also realize that there is ample room for code optimization. Some of the things that we have considered are pre-calculating sums of terms to convert computation of the correlation coefficient from a sum over N terms to the difference of two sums. Another way to speed the code would be to use hierarchical step sizes in a binary search to find the value of lnN that gives the best straight line. Our study involving as it did a small group working in different time zones and under extreme time pressure revealed that scientific computation nowadays faces a Babel of computer languages. In some ways this is good as we generally re-coded things rather than struggle with the favorite language of others. Still, we worry about the future of science when so many different tools are used. In this work we used Python for data wrangling and some plotting, Perl and Unix shell tools for data manipulation, Mortran (effectively C++) for the main calculations, xmgrace and gnuplot for other plotting, Excel (and Openoffice) for playing with data. And this diversity is for a group of three! tl;dr, there is no code. I’ve asked Michael Levitt repeatedly for the code to reproduce the figures in his paper and have not received it. I can’t reproduce his plots. Levitt now lies when confronted about his misguided and wrong prediction about COVID-19 in Israel. He claims it is a “red herring”, and that he was talking about “excess deaths”. I guess he figures he can hide behind Hebrew. There is a recording where anyone can hear him being asked directly if he is saying he will be surprised with more than 10 COVID-19 deaths in Israel, and his answers is very clear: “I will be very surprised”. It is profoundly demoralizing to discover that a person you respected is a liar, a demagogue or worse. Sadly, this has happened to me before. Levitt continues to put people’s lives at risk by spewing lethal nonsense. He is suggesting that we should let COVID-19 spread in the population so it will mutate to be less harmful. This is nonsense. He is promoting anti-vax conspiracy theories that are nonsense. He is promoting nonsense conspiracy theories about scientists. And yet, he continues to have a prominent voice. It’s not hard to see why. The article, similar to all the others where he is interviewed, begins with “Nobel Prize winner…” In the Talmud, in Mishnah Sanhedrin 4:9, it is written “Whoever destroys a soul, it is considered as if he destroyed an entire world”. I thought of this when listening to an interview with Michael Levitt that took place in May, where he said: I am a real baby-boomer, I was born in 1947, and I think we’ve really screwed up. We cause pollution, we allowed the world’s population to increase three-fold, we’ve caused the problems of global warming, we’ve left your generation with a real mess in order to save a really small number of very old people. If I was a young person now, I would say, “now you guys are gonna pay for this.” Despite much ado about the #metoo movement in recent years, the crisis of sexual harassment in academia persists without an end in sight. The academic sexual misconduct database now lists 1,051 cases, each of them a tragedy of trauma, unspeakable violations of victims, and dreams destroyed. I’ve written previously about two cases listed in the database (Yuval Peres and Terry Speed). Now, I feel compelled to write about yet another sexual harassment case. Adrian Dumitrescu is a professor in the Department of Mathematical Sciences at the University of Wisconsin, Milwaukee. I have known of his work for many years, as we have a shared interest in extremal combinatorics, having both worked on the Erdös-Szekeres “Happy Ending Problem”. Last week a Facebook post was brought to my attention, in which a graduate student describes a horrible case of sexual harassment by Prof. Dumitrescu that occurred during a conference in Boston in 2016. This student filed a Title IX complaint with the University of Wisconsin, and I have a copy of the report. The Office of Equity and Diversity (EDS) that investigated the case found that “Based on the totality of the circumstances, the information obtained pursuant to this investigation, and for all the reasons set forth above, EDS concludes that there is sufficient evidence to support a finding, by preponderance of the evidence, of sexual harassment against the Respondent [Prof. Dumitrescu].” Furthermore, the report states that “based on the seriousness of the Respondent’s conduct, EDS believes that disciplinary action is warranted in this matter, and recommends that the Provost refer this case for imposition of discipline”. As I write this post, Prof. Dumitrescu is still listed as a professor at the University of Wisconsin, Milwaukee. Notably, after being sexually harassed by the Respondent, and before filing a report with Title IX, the student consulted her Ph.D. advisor. The report describes his response as follows: “[he] told her that the Respondent had a ‘high reputation’ in the field and it was better to ‘avoid trouble’ and not to report her concerns.” And yet she had the courage to report the case, despite the attempt to silence her, and having being threatened by the Respondent, as he coerced her to sleep with him, that if she did not acquiesce to his demands he would not conduct research with her and he might prevent senior scholars at her university from working with her. The report details how the sexual harassment impacted the complainant’s research progress and mental well-being. Yet again, a talented young scientist finds herself with debilitating trauma, a career in jeopardy, and powerless in the face of an establishment that excuses harassers. The details of this case are of course different than every other sexual harassment case. Each is tragic in its own way. And yet elements of what happened here are to be found in all sexual harassment cases. Power imbalance. Coercion. Threats. Silencing of the victim. Inaction. Banal injustice. This will be case number 1,052 in the academic sexual misconduct database. We must do better. Update June 11, 2021: Adrian Dumitrescu left the University of Wisconsin, Milwaukee in January 2021. Rapid testing has been a powerful tool to control COVID-19 outbreaks around the world (see Iceland, Germany, …). While many countries support testing through government sponsored healthcare infrastructure, in the United States COVID-19 testing has largely been organized and provided by for-profit businesses. While financial incentives coupled with social commitment have motivated many scientists and engineers at companies and universities to work hard around the clock to facilitate testing, there are also many individuals who have succumbed to greed. Opportunism has bubbled to the surface and scams, swindles, rackets, misdirection and fraud abound. This is happening at a time when workplaces are in desperate need of testing, and demands for testing are likely to increase as schools, colleges and universities start opening up in the next month. Here are some examples of what is going on: • First and foremost there is your basic fraud. In July, a company called “Fillakit”, which had been awarded a$10.5 million federal contract to make COVID-19 test kits, was shipping unusable, contaminated, soda bottles. This “business”, started by some law and real estate guy called Paul Wexler, who has been repeatedly accused of fraud, went under two months after it launched amidst a slew of investigations and complaints from former workers. Oh, BTW, Michigan ordered 322,000 Fillakit tubes which went straight to the trash (as a result they could not do a week worth of tests).
• Not all fraud is large scale. Some former VP at now defunct “Cure Cannabis Solutions” ordered 100 COVID-19 test kits that do who-knows-what at a price of 50c a kit. The Feds seized it. These kits, which were not FDA approved, were sourced from “Anhui DeepBlue Medical” in Hefei, China.
• To be fair, the Cannabis guy was small fry. In Laredo Texas, some guy called Robert Castañeda received assistance from a congressman to purchase $500,000 of kits from the same place! Anhui DeepBlue Medical sent Castańeda 20,000 kits ($25 a test). Apparently the tests had 20% accuracy. To his credit, the Cannabis guy paid 1/50th the price for this junk.
• Let’s not forget who is really raking in the bucks though. Quest Diagnostics and LabCorp are the primary testing outfits in the US right now; each is processing around 150,000 tests a day. These are for-profit companies and indeed they are making a profit. The economics is simple: insurance companies reimburse LabCorp and Quest Diagnostics for the tests. The rates are basically determined by the amount that Medicare will pay, i.e. the government price point. Intiially, the reimbursement was set at $51, and well… at that price LabCorp and Quest Diagnostics just weren’t that interested. I mean, people have to put food on the table, right? (Adam Schechter, CEO of LabCorp makes$4.9 million a year; Steve Rusckowski, CEO of Quest Diagnostics, makes $9.9 million a year). So the Medicare reimbursement rate was raised to$100. The thing is, LabCorp and Quest Diagnostics get paid regardless of how long it takes to return test results. Some people are currently waiting 15 days to get results (narrator: such tests results are useless).
• Perhaps a silver lining lies in the stock price of these companies. The title of this post is “$How to Profit From COVID-19 Testing$”. I guess being able to take a week or two to return a test result and still get paid $100 for something that cost$30 lifts the stock price… and you can profit!
• Let’s not forget the tech bros! A bunch of dudes in Utah from companies like Nomi, Domo and Qualtrics signed a two-month contract with the state of Utah to provide 3,000 tests a day. One of the tech executives pushing the initiative, called TestUtah, was a 37-year old founder (of Nomi Health) by the name of Mark Newman. He admitted that “none of us knew anything about lab testing at the start of the effort”. Didn’t stop Newman et al. from signing more than $50 million in agreements with several states to provide testing. tl;dr: the tests had a poor limit of detection, samples were mishandled, throughput was much lower than promised etc. etc. and as a result they weren’t finding positive cases at rates similar to other testing facilities. The significance is summarized poignantly in a New Yorker piece about the debacle: “I might be sick, but I want to go see my grandma, who’s ninety-five. So I go to a TestUtah site, and I get tested. TestUtah tells me I’m negative. I go see grandma, and she gets sick from me because my result was wrong, because TestUtah ran an unvalidated test.” P.S. There has been a disturbing TestUtah hydroxycholorquine story going on behind the scenes. I include this fact because no post on fraud and COVID-19 would be complete without a mention of hydroxycholoroquine. • Maybe though, the tech bros will save the day. The recently launched$5 million COVID-19 X-prize is supposed to incentivize the development of “Frequent. Fast. Cheap. Easy.” COVID-19 testing. The goal is nothing less than to “radically change the world.” I’m hopeful, but I just hope they don’t cancel it like they did the genome prize. After all, their goal of “500 tests per week with 12 hour turnaround from sample to result” is likely to be outpaced by innovation just like what happened with genome sequencing. So in terms of making money from COVID-19 testing don’t hold your breath with this prize.
• As is evident from the examples above, one of the obstacles to quick riches with COVID-19 testing in the USA is the FDA. The thing about COVID-19 testing is that lying to the FDA on applications, or providing unauthorized tests, can lead to unpleasantries, e.g. jail. So some play it straight and narrow. Consider, for example, SeqOnce, which has developed the Azureseq SARS-CoV-2 RT-qPCR kit. These guys have an “EUA-FDA validated test”:
This is exactly what you want! You can click on “Order Now” and pay $3,000 for a kit that can be used to test several hundred samples (great price!) and the site has all the necessary information: IFUs (these are “instructions for use” that come with FDA authorized tests), validation results etc. If you look carefully you’ll see that administration of the test requires FDA approval. The company is upfront about this. Except the test is not FDA authorized; this is easy to confirm by examining the FDA Coronavirus EUA site. One can infer from a press release that they have submitted an EUA (Emergency Use Authorization) but while they claim it has been validated, nowhere does it say it has been authorized.Clever eh? Authorized, validated, authorized, validated, authorized, .. and here I was just about to spend$3,000 for a bunch of tests that cannot be currently legally administered. Whew!At least this is not fraud. Maybe it’s better called… I don’t know… a game? Other companies are playing similar games. Gingko Bioworks is advertising “testing at scale, supporting schools and businesses” with an “Easy to use FDA-authorized test” but again this seems to be a product that has “launched“, not one that, you know, actually exists; I could find no Gingko Bioworks test that works at scale that is authorized on the FDA Coronavirus EUA website, and it turns out that what they mean by FDA authorized is an RT-PCR test that they have outsourced to others.  Fingers crossed though- maybe the marketing helped CEO Jason Kelly raise the $70 million his company has received for the effort; I certainly hope it works (soon)! • By the way, I mentioned that the SeqOnce operation is a bunch of “guys”. I meant this literally; this is their “team”: Just one sec… what is up with biotech startups and 100% men leadership teams? See Epinomics, Insight Genetics, Ocean Genomics, Kailos Genetics, Circulogene, etc. etc.)… And what’s up with the black and white thing? Is that to try to hide that there are no people of color? I mention the 100% male team because if you look at all the people mentioned in this post, all of them are guys (except the person in the next sentence), and I didn’t plan that, it just how it worked out. Look, I’m not casting shade on the former CEO of Theranos. I’m just saying that there is a strong correlation here. Sorry, back to the regular programming… • Speaking of swindlers and scammers, this post would not be complete without a mention of the COVID-19 testing czar, Jared Kushner. His secret testing plan for the United States went “poof into thin air“! I felt that the 1 million contaminated and unusable Chinese test kits that he ordered for$52 million deserved the final mention in this post. Sadly, these failed kits appear to be the main thrust of the federal response to COVID-19 testing needs so far, and consistent with Trump’s recent call to, “slow the testing down” (he wasn’t kidding). Let’s see what turns up today at the hearings of the U.S. House Select Subcommittee on Coronavirus, whose agenda is “The Urgent Need for a National Plan to Contain the Coronavirus”.

Today, June 10th 2020, black academic scientists are holding a strike in solidarity with Black Lives Matter protests. I strike with them and for them. This is why:

I began to understand the enormity of racism against blacks thirty five years ago when I was 12 years old. A single event, in which I witnessed a black man pleading for his life, opened my eyes. I don’t remember his face but I do remember looking at his dilapidated brown pants and noticing his hands shaking around the outside of his pockets while he plead for mercy:

The year was 1985, and I was visiting my friend Tamir Orbach at his house in Pretoria Tshwane, South Africa, located in Muckleneuk hill. We were playing in the courtyard next to Tamir’s garage, which was adjacent to a retaining wall and a wide gate. Google Satellite now enables virtual visits to anywhere in the world, and it took me seconds to find the house. The courtyard and retaining wall look the same. The gate we were playing in front of has changed color from white to black:

The house was located at the bottom of a short cul de sac on the slope of a hill. It’s difficult to see from the aerial photo, but in the street view, looking down, the steep driveway is visible. The driveway stones are the same as they were the last time I was at the house in the 1980s:

We heard some commotion at the top of the driveway. I don’t remember what we were doing at that moment, but I do remember seeing a man sprinting down the hill towards us. I remember being afraid of him. I was afraid of black men. A police officer was chasing him, gun in hand, shouting at the top of his lungs. The man ran into the neighboring property, scaled a wall to leap onto a roof, only to realize he may be trapped. He jumped back onto the driveway, dodged the cop, and ran back up the hill. I remember thinking that I had never seen a man run so fast. The policeman, by now out of breath but still behind the man, chased close behind with his gun swinging around wildly.

There was a second police officer, who was now visible standing at the top of the driveway, feet apart, and pointing a gun down at the man. We were in the line of fire, albeit quite far away behind the gate. The sprint ended abruptly when the man realized he had, in fact, been trapped. Tamir and I had been standing, frozen in place, watching the events unfold in front of us. Meanwhile the screaming had drawn one of our parents out of the house, concerned about the commotion and asking us what was going on. We walked, together, up the driveway to the street.

The man was being arrested next to a yellow police pickup truck, a staple of South African police at the time and an emblem of police brutality. The police pickup trucks had what was essentially a small jail cell mounted on the flat bed, and they were literal pick up trucks; their purpose was to pick up blacks off the streets.

Dogs were barking loudly in the back of the pickup truck and the man was sobbing.

The police were yelling at the man.

“Your passbook no good!! No pass!! Your passbook!! You’re going in with the dogs and coming with us!”

“Please… please… ” the man begged. I remember him crying. He was terrified of the dogs. They had started barking so loudly and aggressively that the vehicle was shaking. The man kept repeating “Please… not with the dogs… please… they will kill me. Please… help me. Please… the dogs will kill me.”

He was pleading for his life.

Law

The passbook the police were yelling about was a sort of domestic or internal passport all black people over the age of 16 were required to carry at all times in white areas. South Africa, in 1985, was a country that was racially divided. Some cities were for whites only. Some only for blacks. “Coloureds”, who were defined as individuals of mixed ancestry, were restricted to cities of their own. In his book “Born a Crime“, Trevor Noah describes how these anti-miscegenation laws resulted in it being impossible for him to legally live with his mother when he was a child. Note that Mississippi removed anti-miscegenation laws from its state constitution only in 1987 and Alabama in 2000.

The South African passbook requirement stemmed from a law passed in 1952, with origins dating back to British policies from the 18th century. The law had the following stipulation:

No black person could stay in a white urban area for more than 72 hours unless explicit permission was granted by an employer (required to be white).

The passbook contained behavioral evaluations from employers. Permission to enter an area could be revoked by any government employee for any reason.

All the live-in maids (as they were called) in Pretoria had passbooks permitting them to live (usually in an outhouse) on the property of their “employer”. I put “employer” in quotes because at best they would earn $250 a month (in todays$ adjusted for inflation) would sleep in a small shack outside of a large home, and receive a small budget for food which would barely cover millie pap. In many cases they lived in outhouses without running water, were abused, beaten and raped. Live-in-maids spent months at a time apart from their children and families- they couldn’t leave their jobs for fear of being fired and/or losing their pass permission. Their families couldn’t visit them as they did not have permission, by pass laws, to enter the white areas in which the live-in-maids worked.

Most males had passbooks allowing them only day trips into the city from the black townships in which they lived. Many lived in Mamelodi, a township 15 miles east of Tswhane, and would travel hours to and from work because they were not allowed on white public transport. I lived in Pretoria for 13 years and I never saw Mamelodi.

I may have heard about passbooks before the incident at Tamir’s house, but I didn’t know what they were or how they worked. Learning about pass laws was not part of our social studies or history curriculum. At my high school, Pretoria Boys High School, a Milner school which counts among its alumni individuals such as dilettante Elon Musk and murderer Oscar Pistorius, we learned about the history of South Africa’s white architects, people like Cecil Rhodes (may his name and his memory be erased). There was one black boy in the school when I was there (out of about 1,200 students). He was allowed to attend because he was the son of an ambassador, as if somehow that mitigated his blackness.

South Africa started abandoning its pass laws in 1986, just a few months after the incident I described above. Helen Suzman described it at the time as possibly one of the most eminent government reforms ever enacted. Still, although this was a small step towards dismantling apartheid, Nelson Mandela was still in jail, in Pollsmoor Prison at that time, and he remained imprisoned for 3 more years until he was released from captivity after 27 years in 1990.

Order

We did not stand by idly while the man was being arrested. We asked the police to let him go, or at least to not throw him in with the dogs, but the cops ignored us and dragged the man towards the back of the van. The phrase “kicking and screaming” is bantered about a lot; there is even a sports comedy with that title. That day I saw a man literally kicking and screaming for his life. The back doors of the van were opened and the dogs, tugging against their leash, appeared to be ready to devour him whole. He was tossed inside like a piece of meat.

The ferocity of the police dogs I saw that day was not a coincidence or accident, it was by design. South Africa, at one time, developed a breeding program at Roodeplaat Breeding Enterprises led by German geneticist Peter Geertshen to create a wolf-dog hybrid. Dogs were bred for their aggression and strength. The South African Boerboel is today one of the most powerful dog breeds in the world, and regularly kills in the United States, where it is imported from South Africa.

After encounters with numerous Boerboels, Dobermans, Rottweilers and Pitbull dogs as a child in South Africa I am scared of dogs to this day. I know it’s not rational, and some of my best friends and family have dogs that I adore and love, but the fear lingers. Sometimes I come across a K-9 unit and the terror surfaces. Police dogs are potent police weapons here, today, just as they were in South Africa in the 1980s. There is a long history of this here. Dogs were used to terrorize blacks in the Civil Rights era, and the recent invocation of “vicious dogs” by the president of the United States conjures up centuries of racial terror:

I learned at age 12 that LAW & ORDER isn’t all it’s hyped up to be.

I immigrated to America in August 1988, and imagined that here I would find a land free of the suffocating racism of South Africa. In my South African high school racism was open, accepted and embraced. Nigg*r balls were sold in the campus cafeteria (black licorice balls), and students would tell idiotic “jokes”  in which dead blacks were frequently the punchline. Some of the teachers were radically racist. My German teacher, Frau Webber, once told me and Tamir that she would swallow her pride and agree to teach us despite the fact that we were Jews. But much more pernicious was the systemic, underlying, racism. When I grew up the idea that someday I would go to university and study alongside a black person just seemed preposterous. My friends and I would talk about girls. The idea that any of us would ever date, let alone marry an African girl, was just completely and totally out of the realm of possibility. While my school, teachers and friends were what one would consider “liberal” in South Africa, e.g. many supported the ANC, their support of blacks was largely restricted to the right to vote.

Sadly, America was not the utopia I imagined. In 1989, a year after I immigrated here, Yusef Hawkins was murdered in a hate crime by white youths who thought he was dating a white woman. That was also the year of the “Central Park Five“, in which Trump played a central, disgraceful and racist role. I finished high school in Palo Alto, across a highway from East Palo Alto, and the difference between the cities seemed almost as stark as between the white and black neighborhoods in South Africa. I learned later that this was the result of redlining. My classmates and teachers in Palo Alto were obsessed, in 1989, with the injustices in South Africa. but never once discussed East Palo Alto with me or with each other. I was practicing for the SAT exams at the time and remember thinking Palo Alto : East Palo Alto = Pretoria : Mamelodi.

Three years after that, when I was an undergraduate student studying at Caltech in Los Angeles, the Rodney King beating happened. I saw a black man severely beaten on television in what looked like a clip borrowed from South Africa. My classmates at the time thought it would be exciting to drive to South Central Los Angeles to see the “rioters” up close. They had never visited those areas before,  nor did they return afterwards. I was reminded at the time of the poverty tourism my friends in South Africa would partake in: a tour to Soweto accompanied by guides with guns to see for oneself how blacks lived. Then right back home for a braai (BBQ). My classmates came back from their Rodney King tour excitedly telling stories of violence and dystopia. Then they partied into the night.

I thought about my only classmate, one out of 200, who was actually from South Los Angeles and about the dissonance that was his life and my classmates’ partying.

Now I am a professor, and I am frequently present in discussions on issues such as undergraduate and graduate admissions, and hiring. Faculty talk a lot, sometimes seemingly endlessly, about diversity, representation, gender balance, and so forth and so on. But I’ve been in academia for 20+ years and it was only three years ago, after moving to Caltech, that I attended a faculty meeting with a black person for the first time. Sometimes I look around during faculty meetings and wonder if I am in America or South Africa? How can I tell?

Racism

Today is an opportunity for academics to reflect on the murder of George Floyd, and to ask difficult questions of themselves. It’s not for me to say what all the questions are or ought to be. I will say this: at a time when everything is unprecedented (Trump’s tweets, the climate, the stock market, the pandemic, etc. etc.) the murder of George Floyd was completely precedented. His words. The mode of murder. The aftermath. It has happened many times before, including recently. And so it is in academia. The fundamental racism, the idea that black students, staff, and faculty, are not truly as capable as whites, it’s simply a day-to-day reality in academia, despite all the talk and rhetoric to the contrary. Did any academics, upon hearing of the murder of George Floyd, worry immediately that it was one of their colleagues, George Floyd, Ph.D., working at the University of Minnesota who was killed?

I will take the time today to read. I will pick up Long Walk to Freedom, and I will also read #BlackintheIvory. I may read some Alan Paton. I will pause to think about how my university can work to improve the recruitment, mentoring, and experience of black students, staff and faculty. Just some ideas.

All these years since leaving South Africa I’ve had a recurring dream. I fly around Pretoria. The sun has just set and the Union Buildings are lit up, glowing a beautiful orange in the distance. The city is empty. My friends are not there. The man I saw pleading for his life in 1985 is gone. I wonder what the police did to him when he arrived at the police station. I wonder whether he died there, like many blacks at the time did. I fly nervously, trying to remember whether I have my passbook on me. I remember I’m classified white and I don’t need a passbook. I hear dogs barking and wonder where they are, because the city is empty. I wonder what it will feel like when they eat me, and then I remember I’m white and I’m not their target. I hope that I don’t encounter them anyway, and I realize what a privilege it is to be able to fly where they can’t reach me. Then I notice that I’m slowly falling, and barely clearing the slopes of Muckleneuk hill. I realize I will land and am happy about that. I slowly halt my run as my feet gently touch the ground.

The widespread establishment of statistics departments in the United States during the mid-20th century can be traced to a presentation by Harold Hotelling in the Berkeley Symposium on Mathematical Statistics and Probability in 1945. The symposium, organized by Berkeley statistician Jerzy Neyman, was the first of six such symposia that took place every five years, and became the most influential meetings in statistics of their time. Hotelling’s lecture on “The place of statistics in the university” inspired the creation of several statistics departments, and at UC Berkeley, Neyman’s establishment of the statistics department in the 1950s was a landmark moment for statistics in the 20th century.

Neyman was hired in the mathematics department at UC Berkeley by a visionary chair, Griffith Evans, who transformed the UC Berkeley math department into a world-class institution after his hiring in 1934. Evans’ vision for the Berkeley math department included statistics, and Eric Lehmann‘s history of the UC Berkeley statistics department details how Evans’ commitment to diverse areas in the department led him to hire Neyman without even meeting him. However, Evans’ progressive vision for mathematics was not shared by all of his colleagues, and the conservative, parochial attitudes of the math department contributed to Neyman’s breakaway and eventual founding of the statistics department. This dynamic was later repeated at universities across the United States, resulting in a large gulf between mathematicians and statistics (ironically history may be repeating itself with some now suggesting that the emergence of “data science” is a result of conservatism among statisticians leading them to cling to theory rather than to care about data).

The divide between mathematicians and statistics is unfortunate for a number of reasons, one of them being that statistical literacy is important even for the purest of the pure mathematicians. A recent debate on the appropriateness of diversity statements for job applicants in mathematics highlights the need: analysis of data, specifically data on who is in the maths community, and their opinions on the issue, turns out to be central to understanding the matter at hand. Case in point is a recent preprint by two mathematicians:

Joshua Paik and Igor Rivin, Data Analysis of the Responses to Professor Abigail Thompson’s Statement on Mandatory Diversity Statements, arXiv, 2020.

This statistics preprint examines attempts to identify the defining attributes of mathematicians who signed recent letters related to diversity statement requirements in mathematics job searches. I was recently asked to provide feedback on the manuscript, ergo this blog post.

### Reproducibility

In order to assess the results of any preprint or paper, it is essential, as a first step, to be able to reproduce the analysis and results. In the case of a preprint such as this one, this means having access to the code and data used to produce the figures and to perform the calculations. I applaud the authors for being fully transparent and making available all of their code and data in a Github repository in a form that made it easy to reproduce all of their results; indeed I was able to do so without any problems. 👏

### The dataset

The preprint analyzes data on signatories of three letters submitted in response to an opinion piece on diversity statement requirements for job applicants published by Abigail Thompson, chair of the mathematics department at UC Davis. Thompson’s letter compared diversity statement requirements of job applicants to loyalty oaths required during McCarthyism. The response letters range from strong affirmation of Thompson’s opinions, to strong refutation of them. Signatories of “Letter A”, titled “The math community values a commitment to diversity“, “strongly disagreed with the sentiments and arguments of Dr. Thompson’s editorial” and are critical of the AMS for publishing her editorial.” Signatories of “Letter B”, titled “Letter to the editor“, worry about “direct attempt[s] to destroy Thompson’s career and attempt[s] to intimidate the AMS”. Signatories of “Letter C”,  titled “Letter to the Notices of the AMS“, write that they “applaud Abigail Thompson for her courageous leadership [in publishing her editorial]” and “agree wholeheartedly with her sentiments.”

The dataset analyzed by Paik and Rivin combines information scraped from Google Scholar and MathSciNet with data associated to the signatories that was collated by Chad Topaz. The dataset is available in .csv format here.

### The Paik and Rivin result

The main result of Paik and Rivin is summarized in the first paragraph of their Conclusion and Discussion section:

“We see the following patterns amongst the “established” mathematicians who signed the three letters: the citations numbers distribution of the signers of Letter A is similar to that of a mid-level mathematics department (such as, say, Temple University), the citations metrics of Letter B are closer to that of a top 20 department such as Rutgers University, while the citations metrics of the signers of Letter C are another tier higher, and are more akin to the distribution of metrics for a truly top department.”

A figure from their preprint summarizing the data supposedly supporting their result, is reproduced below (with the dotted blue line shifted slightly to the right after the bug fix):

Paik and Rivin go a step further, using citation counts and h-indices as proxies for “merit in the judgement of the community.” That is to say, Paik and Rivin claim that mathematicians who signed letter A, i.e. those who strongly disagreed with Thompson’s equivalence between diversity statements and McCarthy’s loyalty oaths, have less “merit in the judgement of the community” than mathematicians who signed letter C, i.e. those who agreed wholeheartedly with her sentiments.

The differential is indeed very large. Paik and Rivin find that the mean number of citations for signers of Letter A is 2397.75, the mean number of citations for signers of Letter B is 4434.89, and the mean number of citations for signers of Letter C is 6226.816. To control for an association between seniority and number of citations, the computed averages are based only on citation counts of full professors. [Note: a bug in the Paik-Rivin code results in an error in their reporting for the mean for group B. They report 4136.432 whereas the number is actually 4434.89.]

This data seems to support Paik and Rivin’s thesis that mathematicians who support the use of diversty statements in hiring and who strongly disagree with Thompson’s analogy of such statements to McCarthy’s loyalty oaths, are second rate mathematicians, whereas those who agree wholeheartedly with Thompson are on par with professors at “truly top departments”.

But do the data really support this conclusion?

### A fool’s errand

Before delving into the details of the data Paik and Rivin analyzed, it is worthwhile to pause and consider the validity of using citations counts and h-indices as proxies for “merit in the judgement of the community”. The authors themselves note that “citations and h-indices do not impose a total order on the quality of a mathematician” and emphasize that “it is quite obvious that, unlike in competitive swimming, imposing such an order is a fool’s errand.” Yet they proceed to discount their own advice, and wholeheartedly embark on the fool’s errand they warn against. 🤔

I examined the mathematicians in their dataset and first, as a sanity check, confirmed that I am one of them (I signed one of the letters). I then looked at the associated citation counts and noticed that out of 1435 mathematicians who signed the letters, I had the second highest number of citations according to Google Scholar (67,694), second only to Terence Tao (71,530). We are in the 99.9th percentile. 👏 Moreover, I have 27 times more citations than Igor Rivin. According to Paik and Rivin this implies that I have 27 times more merit in the judgement of our peers. I should say at least 27 times, because one might imagine that the judgement of the community is non-linear in the number of citations. Even if one discounts such quantitative comparisons (Paik and Rivin do note that Stephen Smale has fewer citations than Terence Tao, and that it would be difficult on that basis alone to conclude that Tao is the better mathematician), the preprint makes use of citation counts to assess “merit in the judgement of the community”, and thus according to Paik and Rivin my opinions have substantial merit. In fact, according to them, my opinion on diversity statements must be an extremely meritorious one. I imagine they would posit that my opinion on the debate that is raging in the math community regarding diversity statement requirements from job applicants is the correct, and definitive one. Now I can already foresee protestations that, for example, my article on “Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation” which has 9,438 citations is not math per se, and that it shouldn’t count. I’ll note that my biology colleagues, after reading the supplement, think it’s a math paper, but in any case, if we are going to head down that route shouldn’t Paik and Rivin read the paper to be sure? And shouldn’t they read every paper of mine, and every paper of every signatory to determine it is valid for their analysis? And shouldn’t they adjust the citation counts of every signatory? Well they didn’t do any of that, and besides, they included me in their analysis so… I proceed…

The citation numbers above are based on Google Scholar citations. Paik and Rivin also analyze MathSciNet citations and state that they prefer them because “only published mathematics are in MathSciNet, and is hence a higher quality data source when comparing mathematicians.” I checked the relationship between Scholar and MathSciNet citations and found that, not surprisingly, they have a correlation of 0.92:

I’d say they are therefore interchangeable in terms of the authors’ use of them as a proxy for “merit”.

But citations are not proxies for merit. The entire premise of the preprint is ridiculous. Furthermore, even if it was true that citations were a meaningful attribute of the signatories to analyze, there are many other serious problems with the preprint.

### The elephant not in the room

Paik and Rivin begin their preprint with a cursory examination of the data and immediately identify a potential problem… missing data. How much data is missing? 64.11% of individuals do not have associated Google Scholar citation data, and 78.82% don’t have MathSciNet citation data. Paik and Rivin brush this issue aside remarking that “while this is not optimal, a quick sample size calculation shows that one needs 303 samples or 21% of the data to produce statistics at a 95% confidence level and a 5% confidence interval.” They are apparently unaware of the need for uniform population sampling, and don’t appear to even think for a second of the possible ascertainment biases in their data. I thought for a second.

For instance, I wondered whether there might be a discrepancy between the number of citations of women with Google Scholar pages vs. women without such pages. This is because I’ve noticed anecdotally that several senior women mathematicians I know don’t have Google Scholar pages, and since senior scientists presumably have more citations this could create a problematic ascertainment bias. I checked and there is, as expected, some correlation between age post-Ph.D. and citation count (cor = 0.36):

To test whether there is an association between presence of a Google Scholar page and citation number I examined the average number of MathSciNet citations of women with and without Google Scholar pages. Indeed, the average number of citations of women without Google Scholar pages is much lower than those with a Google Scholar page (898 vs. 621). For men the difference is much smaller (1816 vs. 1801). By the way, the difference in citation number between men and women is itself large, and can be explained by a number of differences starting with the fact that the women represented in the database have much lower age post-Ph.D. than the men (17.6 vs. 26.3), and therefore fewer citations (see correlation between age and citations above).

The analysis above suggests that perhaps one should use MathSciNet citation counts instead of Google Scholar. However the extent of missing data for that attribute is highly problematic (78.82% missing values). For one thing, my own MathSciNet citation counts are missing, so there were probably bugs in the scraping. The numbers are also tiny. There are only 46 women with MathSciNet data among all three letter signatories out of 452 women signatories. I believe the data is unreliable. In fact, even my ascertainment bias analysis above is problematic due to the small number of individuals involved. It would be completely appropriate at this point to accept that the data is not of sufficient quality for even rudimentary analysis. Yet the authors continued.

### A big word

Confounder is a big word for a variable that influences both the dependent and independent variable in an analysis, thus causing a spurious association. The word does not appear in Paik and Rivin’s manuscript, which is unfortunate because it is in fact a confounder that explains their main “result”.  This confounder is age. I’ve already shown the strong relationship between age post-Ph.D. and citation count in a figure above. Paik and Rivin examine the age distribution of the different letter signatories and find distinct differences. The figure below is reproduced from their preprint:

The differences are stark: the mean time since PhD completion of signers of Letter A is 14.64 years, the mean time since PhD completion of signers of Letter B is 27.76 years and the the mean time since PhD completion of signers of Letter C is 35.48 years. Presumably to control for this association, Paik and Rivin restricted the citation count computations to full professors. As it turns out, this restriction alone does not control for age.

The figure below shows the number of citations of letter C signatories who are full professors as a function of their age:

The red line at 36 years post-Ph.D. divides two distinct regimes. The large jump at that time (corresponding to chronological age ~60) is not surprising: senior professors in mathematics are more famous and have more influence than their junior peers, and their work has had more time to be understood and appreciated. In mathematics results can take many years before they are understood and integrated into mainstream mathematics. These are just hypotheses, but the exact reason for this effect is not important for the Paik-Rivin analysis. What matters is that there are almost no full professors among Letter A signers who are more than 36 years post-Ph.D. In fact, the number of such individuals (filtered for those who have published at least 1 article), is 2. Two individuals. That’s it.

Restricting the analysis to full professors less than 36 years post-Ph.D. tells a completely different story to the one Paik and Rivin peddle. The average number of citations of full professors who signed letter A (2922.72) is higher than the average number of citations of full professors who signed letter C (2348.85). Signers of letter B have 3148.83 citations on average. The figure for this analysis is shown below:

The main conclusion of Paik and Rivin, that signers of letters A have less merit than signers of letter B, who in turn have less merit than signers of letter C can be seen to be complete rubbish. What the data reveal is simply that the signers of letter A are younger than the signers of the other two letters.

Note: I have performed my analysis in a Google Colab notebook accessible via the link. It allows for full reproducibility of the figures and numbers in this post, and facilitates easy exploration of the data. Of course there’s nothing to explore. Use of citations as a proxy for merit is a fool’s errand.

### Miscellania

There are numerous other technical problems with the preprint. The authors claim to have performed “a control” (they didn’t). Several p-values are computed and reported without any multiple testing correction. Parametric approximations for the citation data are examined, but then ignored. Moreover, appropriate zero-inflated count distributions for such data are never considered (see e.g. Yong-Gil et al. 2007).  The results presented are all univariate (e.g. histograms of one data type)- there is not a single scatterplot in the preprint! This suggests that the authors are unaware of the field of multivariate statistics. Considering all of this, I encourage the authors to enroll in an introductory statistics class.

### The Russians

In a strange final paragraph of the Conclusion and Discussion section of their preprint, Paik and Rivin speculate on why mathematicians from communist countries are not represented among the signers of letter A. They present hypotheses without any data to back up their claims.

The insistence that some mathematicians, e.g. Mikhail Gromov who signed letters B and C and is a full member at IHES and professor at NYU, are not part of the “power elite” of mathematics is just ridiculous. Furthermore, characterizing someone like Gromov, who arrived in the US from Russia to an arranged job at SUNY Stonybrook (thanks to Tony Phillips) as being a member of a group who “arrived at the US with nothing but the shirts on their backs” is bizarre.

### Diversity matters

I find the current debate in the mathematics community surrounding Prof. Thompson’s letter very frustrating. The comparison of diversity statements to McCarthy’s loyalty oaths is ridiculous. Instead of debating such nonsense, mathematicians need to think long and hard about how to change the culture in their departments, a culture that has led to appallingly few under-represented minorities and women in the field. Under-represented minorities and women routinely face discrimination and worse. This is completely unacceptable.

The preprint by Paik and Rivin is a cynical attempt to use the Thompson kerfuffle to advertise the tired trope of the second-rate mathematician being the one to advocate for greater diversity in mathematics. It’s a sad refrain that has a long history in mathematics. But perhaps it’s not surprising. The colleagues of Jerzy Neyman in his mathematics department could not even stomach a statistician, let alone a woman, let alone a person from an under-represented minority group. However I’m optimistic reading the list of signatories of letter A. Many of my mathematical heroes are among them. The future is theirs, and they are right.

### Blog Stats

• 2,873,553 views