Steven Miller is a math professor at Williams College who specializes in number theory and theoretical probability theory. A few days ago he published a “declaration” in which he performs an “analysis” of phone bank data of registered Republicans in Pennsylvania. The data was provided to him by Matt Braynyard, who led Trump’s data team during the 2016. Miller frames his “analysis” as an attempt to “estimate the number of fraudulent ballots in Pennsylvania”, and his analysis of the data leads him to conclude that

“almost surely…the number of ballots requested by someone other than the registered Republican is between 37,001 and 58,914, and almost surely the number of ballots requested by registered Republicans and returned but not counted is in the range from 38,910 to 56,483.”

A review of Miller’s “analysis” leads me to conclude that his estimates are fundamentally flawed and that the data as presented provide no evidence of voter fraud.

This conclusion is easy to arrive at. The declaration claims (without a reference) that there were 165,412 mail-in ballots requested by registered Republicans in PA, but that “had not arrived to be counted” as of November 16th, 2020. The data Miller analyzed was based on an attempt to call some of these registered Republicans by phone to assess what happened to their ballots. The number of phone calls made, according to the declaration, is 23,184 = 17,000 + 3,500 + 2,684. The number 17,000 consists of phone calls that did not provide information either because an answering machine picked up instead of a person, or a person picked up and summarily hung up. 3,500 numbers were characterized as “bad numbers / language barrier”, and 2,684 individuals answered the phone. Curiously, Miller writes that “Almost 20,000 people were called”, when in fact 23,184 > 20,000.

In any case, clearly many of the phone numbers dialed were simply wrong numbers, as evident by the number of “bad” calls: 3,500. It’s easy to imagine how this can happen: confusion because some individuals share a name, phone numbers have changed, people move, the phone call bank makes an error when dialing etc. Let b be the fraction of phone numbers out of the 23,184 that were “bad”, i.e. incorrect. We can estimate b by noting that we have some information about it: we know that the 3,500 “bad numbers” were bad (by definition). Additionally, it is reported in the declaration that 556 people literally said that they did not request a ballot, and there is no reason not to take them at their word. We don’t know what fraction of the 17,000 individuals called and did not pick up or hung up were wrong numbers, but we do know that the fraction out of the total must equal the fraction out of the 17,000 + those we know for sure were bad numbers, i.e.

23184 \cdot b = 17,000 \cdot b + 556 + 3500.

Solving for b we find that b \approx \frac{2}{3}. I’m surprised the number is so low. One would expect that individuals who requested ballots, but then didn’t send them in, would be enriched for people who have recently moved or are in the process of moving, or have other issues making it difficult to reach them or impossible to reach them at all.

The fraction of bad calls derived translates to about 1,700 bad numbers out of the 2,684 people that were reached. This easily explains not only the 556 individuals who said they did not request a ballot, but also the 463 individuals who said that they mailed back their ballots. In the case of the latter there is no irregularity; the number of bad calls suggests that all those individuals were reached in error and their ballots were legitimately counted so they weren’t part of the 165,412. It also explains the 544 individuals who said they voted in person.

That’s it. The data don’t point to any fraud or irregularity, just a poorly design poll with poor response rates and lots of erroneous information due to bad phone numbers. There is nothing to explain. Miller, on the other hand, has some things to explain.

First, I note that his declaration begins with a signed page asserting various facts about Steven Miller and the analysis he performed. Notably absent from the page, or anywhere else in the document, is a disclosure of funding source for the work and of conflicts of interest. On his work webpage, Miller specifically states that one should always acknowledge funding support.

Second, if Miller really wanted to understand the reason why some ballots were requested for mail-in, but had not yet arrived to be counted, he would also obtain data from Democrats. That would provide a control on various aspects of the analysis, and help to establish whether irregularities, if they were to be detected, were of a partisan nature. Why did Miller not include an analysis of such data?

Third, one might wonder why Steven Miller chose to publish this “declaration”. Surely a professor who has taught probability and statistics for 15 years (as Miller claims he has) must understand that his own “analysis” is fundamentally flawed, right? Then again, I’ve previously found that excellent pure mathematicians are prone to falling into a data analysis trap, i.e. a situation where their lack of experience analyzing real-world datasets leads them to believe naïve analysis that is deeply flawed. To better understand whether this might be the case with Miller, I examined his publication record, which he has shared publicly via Google Scholar, to see whether he has worked with data. The first thing I noticed was that he has published more than 700 articles (!) and has an h-index of 47 for a total of 8,634 citations… an incredible record for any professor, and especially for a mathematician. A Google search for his name displays this impressive number of citations:

As it turns out, his impressive publication record is a mirage. When I took a closer look and found that many of the papers he lists on his Google Scholar page are not his, but rather articles published by other authors with the name S Miller. “His” most cited article was published in 1955, a year that transpired well before he was born. Miller’s own most cited paper is a short unpublished tutorial on least squares (I was curious and reviewed it as well only to find some inaccuracies but hey, I don’t work for this guy).

I will note that in creating his Google Scholar page, Miller did not just enter his name and email address (required). He went to the effort of customizing the page, including the addition of keywords and a link to his homepage, and in doing so followed his own general advice to curate one’s CV (strangely, he also dispenses advice on job interviews, including about shaving- I guess only women interview for jobs?). But I digress: the question is, why is his Google Scholar page display massively inflated publication statistics based on papers that are not his? I’ve seen this before, and in one case where I had hard evidence that it was done deliberately to mislead I reported it as fraud. Regardless of Miller’s motivations, by looking at his actual publications I confirmed what I suspected, namely that he has hardly any experience analyzing real world data. I’m willing to chalk up his embarrassing “declaration” to statistics illiteracy and naïveté.

In summary, Steven Miller’s declaration provides no evidence whatsoever of voter fraud in Pennsylvania.