Lior Pachter

Division of Biology and Biological Engineering &

Department of Computing and Mathematical Sciences California Institute of Technology

**Abstract**

A recently published pilot study on the efficacy of 25-hydroxyvitamin D3 (calcifediol) in reducing ICU admission of hospitalized COVID-19 patients, concluded that the treatment “seems able to reduce the severity of disease, but larger trials with groups properly matched will be required go show a definitive answer”. In a follow-up paper, Jungreis and Kellis re-examine this so-called “Córdoba study” and argue that the authors of the study have undersold their results. Based on a reanalysis of the data in a manner they describe as “rigorous” and using “well established statistical techniques”, they urge the medical community to “consider testing the vitamin D levels of all hospitalized COVID-19 patients, and taking remedial action for those who are deficient.” Their recommendation is based on two claims: in an examination of unevenness in the distribution of one of the comorbidities between cases and controls, they conclude that there is “no evidence of incorrect randomization”, and they present a “mathematical theorem” to make the case that the effect size in the Córdoba study is significant to the extent that “they can be confident that if assignment to the treatment group had no effect, we would not have observed these results simply due to chance.”

Unfortunately, the “mathematical analysis” of Jungreis and Kellis is deeply flawed, and their “theorem” is vacuous. Their analysis cannot be used to conclude that the Córdoba study shows that calcifediol significantly reduces ICU admission of hospitalized COVID- 19 patients. Moreover, the Córdoba study is fundamentally flawed, and therefore there is nothing to learn from it.

**The Córdoba study**

The Córdoba study, described by the authors as a pilot, was ostensibly a randomized controlled trial, designed to determine the efficacy of 25-hydroxyvitamin D3 in reducing ICU admission of hospitalized COVID-19 patients. The study consisted of 76 patients hospitalized for COVID-19 symptoms, with 50 of the patients treated with calcifediol, and 26 not receiving treatment. Patients were administered “standard care”, which according to the authors consisted of “a combination of hydroxychloroquine, azithromycin, and for patients with pneumonia and NEWS score 5, a broad spectrum antibiotic”. Crucially, admission to the ICU was determined by a “Selection Committee” consisting of intensivists, pulmonologists, internists, and members of an ethics committee. The Selection Committee based ICU admission decisions on the evaluation of several criteria, including presence of comorbidities, and the level of dependence of patients according to their needs and clinical criteria.

The result of the Córdoba trial was that only 1/50 of the treated patients was admitted to the ICU, whereas 13/26 of the untreated patients were admitted (p-value = 7.7 ∗ 10^{−7} by Fisher’s exact test). This is a minuscule p-value but it is meaningless. Since there is no record of the Selection Committee deliberations, it impossible to know whether the ICU admission of the 13 untreated patients was due to their previous high blood pressure comorbidity. Perhaps the 11 treated patients with the comorbidity were not admitted to the ICU because they were older, and the Selection Committee considered their previous higher blood pressure to be more “normal” (14/50 treatment patients were over the age of 60, versus only 5/26 of the untreated patients).

Figure 1: Table 2 from [1] showing the comorbidities of patients. It is reproduced by virtue of [1] being published open access under the CC-BY license.

The fact that admission to the ICU could be decided in part based on the presence of co-morbidities, and that there was a significant imbalance in one of the comorbidities, immediately renders the study results meaningless. There are several other problems with it that potentially confound the results: the study did not examine the Vitamin D levels of the treated patients, nor was the untreated group administered a placebo. Most importantly, the study numbers were tiny, with only 76 patients examined. Small studies are notoriously problematic, and are known to produce large effect sizes [9]. Furthermore, sloppiness in the study does not lead to confidence in the results. The authors state that the “rigorous protocol” for determining patient admission to the ICU is available as Supplementary Material, but there is no Supplementary Material distributed with the paper. There is also an embarrassing typo: Fisher’s exact test is referred to twice as “Fischer’s test”. To err once in describing this classical statistical test may be regarded as misfortune; to do it twice looks like carelessness.

**A pointless statistics exercise**

The Córdoba study has not received much attention, which is not surprising considering that by the authors’ own admission it was a pilot that at best only motivates a properly matched and powered randomized controlled trial. Indeed, the authors mention that such a trial (the COVIDIOL trial), with data being collected from 15 hospitals in Spain, is underway. Nevertheless, Jungreis and Kellis [3], apparently mesmerized by the 7.7 ∗ 10^{−7} p-value for ICU admission upon treatment, felt the need to “rescue” the study with what amounts to faux statistical gravitas. They argue for immediate consideration of testing Vitamin D levels of hospitalized patients, so that “deficient” patients can be administered some form of Vitamin D “to the extent it can be done safely”. Their message has been noticed; only a few days after [3] appeared the authors’ tweet to promote it has been retweeted more than 50 times [8].

Jungreis and Kellis claim that the p-value for the effect of calcifediol on patients is so significant, that in and of itself it merits belief that administration of calcifediol does, in fact, prevent admission of patients to ICUs. To make their case, Jungreis and Kellis begin by acknowledging that imbalance between the treated and untreated groups in the previous high blood pressure comorbidity may be a problem, but claim that there is “no evidence of incorrect randomization.” Their argument is as follows: they note that while the p-value for the imbalance in the previous high blood pressure comorbidity is 0.0023, it should be adjusted for the fact that there are 15 distinct comorbidities, and that just by chance, when computing so many p-values, one might be small. First, an examination of Table 2 in [1] (Figure 1) shows that there were only 14 comorbidities assessed, as none of the patients had previous chronic kidney disease. Thus, the number 15 is incorrect. Second, Jungreis and Kellis argue that a Bonferroni correction should be applied, and that this correction should be based on 30 tests (=15 × 2). The reason for the factor of 2 is that they claim that when testing for imbalance, one should test for imbalance in both directions. By applying the Bonferroni correction to the p-values, they derive a “corrected” p-value for previous high blood pressure being imbalanced between groups of 0.069. They are wrong on several counts in deriving this number. To illustrate the problems we work through the calculation step-by-step:

The question we want to answer is as follows: given that there are multiple comorbidities, is there is a significant imbalance in *at least* one comorbidity. There are several ways to test for this, with the simplest being Šidák’s correction [10] given by

where *m* is the minimum p-value among the comorbidities, and *n* is the number of tests. Plugging in *m = 0.0023* (the smallest p-value in Table 2 of [1]) and *n = 14* (the number of comorbidities) one gets 0.032 (note that the Bonferroni correction used by Jungreis And Kellis is the Taylor approximation to the Šidák correction when *m* is small). The Šidák correction is based on an assumption that the tests are independent. However, that is certainly not the case in the Córdoba study. For example, having at least one prognostic factor is one of the comorbidities tabulated. In other words, the p-value obtained is conservative. The calculation above uses n = 14, but Jungreis and Kellis reason that the number of tests is 30 = 15 × 2, to take into account an imbalance in either the treated or untreated direction. Here they are assuming two things: that two-sided tests for each comorbidity will produce double the p-value of a one-sided test, and that two sided tests are the “correct” tests to perform. They are wrong on both counts. First, the two-sided Fisher exact test does not, in general produce a p-value that is double the 1-sided test. The study result is a good example: 1/49 treated patients admitted to the ICU vs. 13/26 untreated patients produces a p-value of 7.7 ∗ 10^{−7} for both the 1-sided and 2-sided tests. Jungreis and Kellis do not seem to know this can happen, nor understand why; they go to great lengths to explain the importance of conducting a 1-sided test for the study result. Second, there is a strong case to be made that a 1-sided test is the correct test to perform for the comorbidities. The concern is not whether there was an imbalance of any sort, but whether the imbalance would skew results by virtue of the study including too many untreated individuals with comorbidities. In any case, if one were to give Jungreis and Kellis the benefit of the doubt, and perform a two sided test, the corrected p-value for the previous high blood pressure comorbidity is 0.06 and not 0.069.

The most serious mistake that Jungreis and Kellis make, however, is in claiming that one can accept the null hypothesis of a hypothesis test when the p-value is greater than 0.05. The p-value they obtain is 0.069 which, even if it is taken at face value, is not grounds for claiming, as Jungreis and Kellis do, that “this is not significant evidence that the assignment was not random” and reason to conclude that there is “no evidence of incorrect randomization”. That is not how p-values work. A p-value less than 0.05 allows one to reject the null hypothesis (assuming 0.05 is the threshold chosen), but a p-value above the chosen threshold is not grounds for accepting the null. Moreover, the corrected p-value is 0.032 which is certainly grounds for rejecting the null hypothesis that the randomization was random.

Correction of the incorrect Jungreis and Kellis statistics may be a productive exercise in introductory undergraduate statistics for some, but it is pointless insofar as assessing the Córdoba study. While the extreme imbalance in the previous high blood pressure comorbidity is problematic because patients with the comorbidity may be more likely to get sick and require ICU admission, the study was so flawed that the exact p-value for the imbalance is a moot point. Given that the presence of comorbidities, not just their effect on patients, was a factor in determining which patients were admitted to the ICU, the extreme imbalance in the previous high blood pressure comorbidity renders the result of the study meaningless *ex facie*.

**A definition is not a theorem is not proof of efficacy**

In an effort to fend off criticism that the comorbidities of patients were improperly balanced in the study, Jungreis and Kellis go further and present a “theorem” they claim shows that there was a minuscule chance that an uneven distribution of comorbidities could render the study results not significant. The “theorem” is stated twice in their paper, and I’ve copied both theorem statements verbatim from their paper:

**Theorem 1** *In a randomized study, let p be the p-value of the study results, and let q be the probability that the randomization assigns patients to the control group in such a way that the values of P _{prognostic}(Patient) are sufficiently unevenly distributed between the treatment and control groups that the result of the study would no longer be statistically significant at the 95% level after p controlling for the prognostic risk factors. Then .*

According to Jungreis and Kellis, *P _{prognostic}(Patient)* is the following: “There can be any number of prognostic risk factors, but if we knew what all of them were, and their effect sizes, and the interactions among them, we could combine their effects into a single number for each patient, which is the probability, based on all known and yet-to-be discovered risk factors at the time of hospital admission, that the patient will require ICU care if not given the calcifediol treatment. Call this (unknown) probability

*P*.”

_{prognostic}(Patient)The theorem is restated in the Methods section of Jungreis and Kellis paper as follows:

**Theorem 2** *In a randomized controlled study, let p be the p-value of the study outcome, and let q be the probability that the randomization distributes all prognostic risk factors combined sufficiently unevenly between the treatment and control groups that when controlling for these prognostic risk p factors the outcome would no longer be statistically significant at the 95% level. Then .*

While it is difficult to decipher the language the “theorem” is written in, let alone its meaning (note Theorem 1 and Theorem 2 are supposedly the same theorem), I was able to glean something about its content from reading the “proof”. The mathematical content of whatever the theorem is supposed to mean, is the definition of conditional probability, namely that if A and B are events with , then

.

To be fair to Jungreis and Kellis, the “theorem” includes the observation that

This is not, by any stretch of the imagination, a “theorem”; it is literally the definition of conditional probability followed by an elementary inequality. The most generous interpretation of what Jungreis and Kellis were trying to do with this “theorem”, is that they were showing that the p-value for the study is so small, that it is small even after being multiplied by 20. There are less generous interpretations.

**Does Vitamin D intake reduce ICU admission?**

There has been a lot of interest in Vitamin D and its effects on human health over the past decade [2], and much speculation about its relevance for COVID-19 susceptibility and disease severity. One interesting result on disease susceptibility was published recently: in a study of 489 patients, it was found that the relative risk of testing positive for COVID-19 was 1.77 times greater for patients with likely deficient vitamin D status compared with patients with likely sufficient vitamin D status [7]. However, definitive results on Vitamin D and its relationship to COVID- 19 will have to await larger trials. One such trial, a large randomized clinical trial with 2,700 individuals sponsored by Brigham and Women’s Hospital, is currently underway [4]. While this study might shed some light on Vitamin D and COVID-19, it is prudent to keep in mind that the outcome is not certain. Vitamin D levels are confounded with many socioeconomic factors, making the identification of causal links difficult. In the meantime, it has been suggested that it makes sense for individuals to maintain reference nutrient intakes of Vitamin D [6]. Such a public health recommendation is not controversial.

As for Vitamin D administration to hospitalized COVID-19 patients reducing ICU admission, the best one can say about the Córdoba study is that nothing can be learned from it. Unfortunately, the poor study design, small sample size, availability of only summary statistics for the comorbidities, and imbalanced comorbidities among treated and untreated patients render the data useless. While it may be true that calcifediol administration to hospital patients reduces subsequent ICU admission, it may also not be true. Thus, the follow-up by Jungreis and Kellis is pointless at best. At worst, it is irresponsible propaganda, advocating for potentially dangerous treatment on the basis of shoddy arguments masked as “rigorous and well established statistical techniques”. It is surprising to see Jungreis and Kellis argue that it may be unethical to conduct a placebo randomized controlled trial, which is one of the most powerful tools in the development of safe and effective medical treatments. They write “the ethics of giving a placebo rather than treatment to a vitamin D deficient patient with this potentially fatal disease would need to be evaluated.” The evidence for such a policy is currently non-existent. On the other hand, there are plenty of known risks associated with excess Vitamin D [5].

**References**

- Marta Entrenas Castillo, Luis Manuel Entrenas Costa, José Manuel Vaquero Barrios, Juan Francisco Alcalá Díaz, José López Miranda, Roger Bouillon, and José Manuel Quesada Gomez. Effect of calcifediol treatment and best available therapy versus best available therapy on intensive care unit admission and mortality among patients hospitalized for COVID-19: A pilot randomized clinical study.
*The Journal of steroid biochemistry and molecular biology,*203:105751, 2020. - Michael F Holick. Vitamin D deficiency.
*New England Journal of Medicine*, 357(3):266–281, 2007. - Irwin Jungreis and Manolis Kellis. Mathematical analysis of Córdoba calcifediol trial suggests strong role for Vitamin D in reducing ICU admissions of hospitalized COVID-19 patients.
*medRxiv*, 2020. - JoAnn E Manson. https://clinicaltrials.gov/ct2/show/nct04536298.
- Ewa Marcinowska-Suchowierska, Małgorzata Kupisz-Urbańska, Jacek Łukaszkiewicz, Paweł Płudowski, and Glenville Jones. Vitamin D toxicity–a clinical perspective.
*Frontiers in endocrinology*, 9:550, 2018 - Adrian R Martineau and Nita G Forouhi. Vitamin D for COVID-19: a case to answer?
*The Lancet Diabetes & Endocrinology*, 8(9):735–736, 2020. - David O Meltzer, Thomas J Best, Hui Zhang, Tamara Vokes, Vineet Arora, and Julian Solway. Association of vitamin D status and other clinical characteristics with COVID-19 test results.
*JAMA network open*, 3(9):e2019722–e2019722, 2020. - Vivien Shotwell.
*https://tweetstamp.org/1327281999137091586*. - Robert Slavin and Dewi Smith. The relationship between sample sizes and effect sizes in systematic reviews in education.
*Educational evaluation and policy analysis*, 31(4):500–506, 2009. - Lynn Yi, Harold Pimentel, Nicolas L Bray, and Lior Pachter. Gene-level differential analysis at transcript-level resolution.
*Genome biology*, 19(1):53, 2018.

## 17 comments

Comments feed for this article

November 17, 2020 at 12:31 pm

Lior PachterThis post was originally submitted to medRxiv, but was rejected from the preprint server on the basis that it constitutes a “rebuttal”.

November 17, 2020 at 1:07 pm

Emir Ozdemircan someone explains this to me like I’m 5

November 17, 2020 at 1:49 pm

Anon“is 0.6 and not 0.69.” is that supposed to be 0.069?

November 17, 2020 at 1:53 pm

Lior PachterYes, thanks for finding the typo. I have fixed it.

November 17, 2020 at 3:35 pm

conradseitzI am gratified to see your analysis of the “Cordoba study.” I wasn’t able, being a relative ignoramus, to see exactly what was wrong with it, but my suspicions were aroused by the study’s description of the “standard therapy”, i.e. the administration of hydroxychloroquine and azithromycin– this being decidedly nonstandard at the date the study was run. I thought those drugs were considered not helpful even in August, when the patients were treated.

You describe the imbalance between the treated and untreated groups as being related to the criteria for ICU admission, that is, the fact that a much larger proportion of untreated patients had a history of hypertension, which was one of the criteria for ICU admission. This would have led to more untreated patients being admitted to ICU simply because they had a history of hypertension.

The rest of it– their excuses for considering that issue irrelevant– flies right over my head, but I am getting used to that on your blog (I remind you that, as a retired GP, I am grossly ignorant and stupid to boot.)

Am I right in assuming that it is alright to continue taking supplemental vitamin D3 regardless of the study? Not to prevent severe COVID-19, but as a general health measure. I am aware that there is no good science that says taking supplemental vitamins as a general rule is useful, but there is no good science that says that it is bad for you, either. Vitamin D3 appears to be the only vitamin for which there exists even marginal evidence for benefits in general.

While there is certainly evidence for toxicity of vitamin D3, I was struck several years ago by its non-toxicity in a rather fragile patient who was inadvertently (due to pharmacy error) given 50,000 units of D3 a day instead of 400 units a day, with a normal dose of supplemental calcium, for several months. I couldn’t explain its failure to poison the patient at the time despite her assurance to me that she was faithfully taking it regularly. I only discovered the error when I inspected her medicine bottles at a routine office visit.

Just speaking in general terms, here, not arguing in favor of vitamin D3 for any specific benefits, nor in favor of deliberately taking massive doses.

Thanks for your rigorous analysis. I commiserate with you on your last post, “The lethal nonsense of Michael Levitt” and the amount of fringe argument it appears to have stirred up in the comment thread. I wish people would just take common-sense precautions and stop arguing that “it’s no worse than the flu.” There’s so much obvious evidence for the novel coronavirus being far worse than the flu…

November 17, 2020 at 4:29 pm

Lior PachterI am far from an expert on vitamin D so I can’t opine on the merits of taking it as a supplement. I did read quite a lot of COVID-19 related vitamin D literature in preparing this post and my conclusion (see the end of my post) is that there are some potentially interesting associations but they require much larger randomized placebo controlled clinical trials to verify.

November 17, 2020 at 7:18 pm

conradseitzThat was my impression as well. Perhaps an adequately sized randomized, placebo-controlled study will show a small but significant effect.

November 17, 2020 at 10:00 pm

Manolis KellisLior Pachter’s concerns about our Córdoba reanalysis (power, comorbidities, randomization) are no different from those we initially raised in our paper, and are all fully addressed in our paper. Thanks for the scrutiny and thanks for the 3 useful minor suggestions (Šidák correction, 14 vs. 15 hypothesis correction, minor theorem wording diffs). We provide a detailed response here: http://compbio.mit.edu/calcifediol/Response_to_Pachter_criticism_of_Calcifediol_paper.pdf

November 18, 2020 at 1:52 am

Johanna NelknerYour calm and factual response is valuable and I was happy to read it. While I once enjoyed reading Lior Pachters Blog I recently felt he is/you are becoming biased by his/your obvious personal belief, ”it’s far worse than the flu”.

Any study, really any study could be taken and the statistical analysis be criticized on such a sophisticated level. Regardless of its outcome.

In my opinion, the experimenter’s bias or Rosenthal-effect hinders almost all of us currently, since the whole discussion is extremely emotionally heated. Might the effort put into this mathematical analysis base on the denial, that there could be such a simple treatmeant, which in turn might indicate a lower severity of the Coronavirus as some may feel?

Stating ”it’s not much worse than the flu, at least not worse enough to justify some of the taken measures” based on scientific research and statistics and promoting a more interdisciplinary approach to taking political measures has become politically incorrect and that scares me. (I am from Germany.)

Lior Pachter, would you mind putting the same effort into a mathematical analysis of the claimed effectivenesses of the vaccines by Biontech and Modena? I would highly appreciate that.

November 18, 2020 at 2:08 am

Lior PachterI’m not sure what you want me to say about the Moderna trial. It’s a randomized, stratified, observer-Blind, placebo-controlled trial, which is the proper way to assess vaccine safety and efficacy. The Córdoba trial was nothing of the sort. In any case, the interim numbers just released from the Moderna trial are very promising. As the trial participants continue to be monitored there will be more data providing much more confidence, but there is every reason to be optimistic. The trial protocol is very detailed, and if you’re interested you can read it here: https://www.modernatx.com/sites/default/files/mRNA-1273-P301-Protocol.pdf

December 3, 2020 at 1:39 pm

AustinMumDr. Kellis, you and Dr. Jungries did a superb job of refuting this blog. Thank you very much. My question to Pachter is, “Why the venom?” Why are you so violently opposed to the idea of empowering ordinary people to improve their health to prevent severe Covid-19, and empowering physicians to overcome it early on?

December 3, 2020 at 1:45 pm

Lior PachterHi AustinMum,

There is nothing I would like more than to discover that a simple action, such as taking vitamin D, prevents severe COVID. If a placebo randomized controlled trial were to show that I’d be thrilled. Unfortunately here we have a highly problematic experimental design that led to a flawed experiment from which we just can’t learn anything.

I wrote this blog post because Jungreis and Kellis’ specifically advocated *against* a placebo randomized controlled trial in their “mathematical analysis” and that is downright dangerous. Placebo randomized controlled trials are the gold standard for deciding whether treatments work, and I’m looking forward to seeing the results of some of the ongoing trials on vitamin D.

Separately from that, the “mathematical analysis” of Jungreis and Kellis is, as I pointed out in my blog post, statistically illiterate and downright embarrassing. Their “rebuttal” is more of the same. My hope is they post a retraction on medRxiv.

November 18, 2020 at 9:44 pm

JoeSome of the spectacular (scientific) fallacies involving vitamin D are well documented here:

https://www.devaboone.com/post/vitamin-d-part-3-the-evidence

As it turns out, since the beginning of times, scientists loved to associate vitamin D with various risks – none of which have ever panned out.

December 3, 2020 at 1:47 pm

AustinMumThe problem with most of the RCTs for vitamin D is that the goals have been too low a serum level for too brief a time, so the solution proposed by the blogger you cited (take vitamin D only in low doses) is misguided. The fact is, the human body is designed for us to be outside in the middle of the day working, every day that the sun is out. Those who do this have very high serum vitamin D levels. That does raise the question of vitamin D being merely a marker of good health. The venom against the RCTs with supplements, such as this study from Cordoba seems to be because if supplementation results in dramatic health benefits, the “marker of good health” theory has to be discarded.

December 3, 2020 at 2:03 pm

Lior PachterMy problems with the Cordoba studied are articulated in my post above. Unfortunately it does not establish, as you seem to wish it had, that “supplementation results in dramatic health benefits”. It doesn’t establish the opposite either. It’s unfortunately just useless because it was very poorly designed.

December 3, 2020 at 11:54 am

AustinMumThe vaccine trials were essentially unblinded because the vaccine causes significant side effects, which allowed many of the participants to accurately guess their study arm. Participants wanted and expected the vaccine to work – that was why they participated. If someone in the placebo group had the sniffles, they would report it as a Covid-19 symptom. If someone in the vaccine group had a fever and achiness, they would assume that it was a side-effect of the vaccine or that it was another illness, and not report it, because they would have been confident that they were protected from Covid-19. None of the participants were tested objectively to see if they HAD Covid-19 – the results were based upon the subjective assessments of the participants themselves.

December 20, 2020 at 7:26 pm

SenHi Lior,

Thanks for the interesting post.

I am a simple clinician rather than a statistician.

The result of this trial even though the sample size is small is very impressive.

Correct me if I am misguided, but at the end of the day if this was a case of randomising hospital admissions into two groups it would by highly improbable to see such results by chance.

As you correctly note and (also the authors of the original study also acknowledge to a certain extent) there is uneven distribution in some of the comorbidities, which could flag an issue with randomisation.

But it seems that this is not adequate to account for the large effect size seen in the study. If study subjects were truly randomised then this would still be a very impressive effect even accounting for the lack of blinding/placebo.

However, the randomisation and allocation process is not described in adequate detail and although the investigators were masked it does not appear that the treating specialists were masked. (I can’t see how it would be possible to mask the treating specialist given the lack of a placebo). To raise further doubts, there are number of quality issues with this study which would ultimately make you question study as a whole.

To me the big red flag in the study is the very high ICU admission rate in the placebo group of 50%. This is 5x higher than what the authors predict in the sample size estimation. This is not acknowledged or explained at all.

The second red flag is the complete lack of further information about patient outcomes.

The third issue is the failure to measure vitamin D levels. It is very difficult to comprehend how one could design a study of vitamin D replacement without measuring levels at baseline and during the study.

Ultimately, no amount of analysis is going to fix these issues with study design.