I visited Duke’s mathematics department yesterday to give a talk in the mathematical biology seminar. After an interesting day meeting many mathematicians and (computational) biologists, I had an excellent dinner with Jonathan Mattingly, Sayan MukherjeeMichael Reed and David Schaeffer. During dinner conversation, the topic of probability theory (and how to teach it) came up, and in particular Buffon’s needle problem.

The question was posed by Georges-Louis Leclerc, Comte de Buffon in the 18th century:

Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?

If the strips are distance $t$ apart, and $l \leq t$, then it is easy to see that the probability $P$ is given by

$P = \int_{\theta =0}^{\frac{\pi}{2}} \int_{x = 0}^{\frac{l}{2}sin \theta} \frac{4}{t \pi} dx d\theta = \frac{2l}{t \pi}$.

The appearance of $\pi$ in the denominator turns the problem into a Monte Carlo technique for estimating $\pi$: simply simulate random needle tosses and count crossings.

It turns out there is a much more elegant solution to the problem– one that does not require calculus. I learned of it from Gian-Carlo Rota when I was a graduate student at MIT. It appears in his book Introduction to Geometric Probability (with Dan Klain) that I have occasionally used when teaching Math 249. The argument relies on the linearity of expectation, and is as follows:

Let $f(l)$ denote the expected number of crossings when a needle of length $l$ is thrown on the floor. Now consider two needles, one of length $l$ and the other $m$, attached to each other end to end (possibly at some angle). If $X_1$ is a random variable describing the number of crossings of the first needle, and $X_2$ of the second, its certainly the case that $X_1$ and $X_2$ are dependent, but because expectation is linear, it is the case that $E(X_1+X_2) = E(X_1)+E(X_2)$. In other words, the total number of crossings is, in expectation, $f(l)+f(m)$.

Buffon’s needle problem: what is the probability that a needle of length $l \leq t$ crosses a line? (A) A short needle being thrown at random on a floor with parallel lines. (B) Two connected needles. The expected number of crossings is proportional to the sum of their lengths. (C) A circle of diameter always crosses exactly two lines.

It follows that $f$ is a linear function, and since $f(0)=0$, we have that $f(l) = cl$ where $c$ is some constant. Now consider a circle of diameter $t$. Such a circle, when thrown on the floor, always crosses the parallel lines exactly twice. If $C$ is a regular polygon with vertices on the circle, and the total length of the polygon segments is $l$, then the total number of crossings is $f(l)$. Taking the limit as the number of segments in the polygon goes to infinity, we find that $f(t \pi ) = 2$. In other words,

$f(t \pi) = c \cdot t \pi = 2 \Rightarrow c = \frac{2}{t \pi}$,

and the expected number of crossings of a needle of length l is $\frac{2l}{t \pi}$. If $l < t$, the number of crossings is either 0 or 1, so the expected number of crossings is, by definition of expectation, equal to the probability of a single crossing. This solves Buffon’s problem no calculus required!

The linearity of expectation appears elementary at first glance. The proof is simple, and it is one of the first “facts” learned in statistics– I taught it to my math 10 students last week. However the apparent simplicity masks its depth and utility; the above example is cute, and one of my favorites, but linearity of expectation is useful in many settings. For example I recently saw an interesting application in an arXiv preprint by Anand Bhaskar, Andy Clark and Yun Song on “Distortion of genealogical properties when the sample is very large“.

The paper addresses an important question, namely the suitability of the coalescent as an approximation to discrete time random mating models, when sample sizes are large. This is an important question, because population sequencing is starting to involve hundreds of thousands, if not millions of individuals.

The results of Bhaskar, Clark and Song are based on dynamic programming calculations of various genealogical quantities as inferred from the discrete time Wright-Fisher model. An example is the expected frequency spectrum for random samples of individuals from a population. By frequency spectrum, they mean, for each k, the expected number of polymorphic sites with k derived alleles and n-k ancestral alleles under an infinite-sites model of mutation in a sample of n individuals. Without going into details (see their equations (8),(9) and (10)), the point is that they are able to derive dynamic programming recursions because they are computing the expected frequencies, and the linearity of expectation is what allows for the derivation of the dynamic programming recursions.

None of this has anything to do with my seminar, except for the fact that the expectation-maximization algorithm did make a brief appearance, as it frequently does in my lectures these days. I spoke mainly about some of the mathematics problems that arise in comparative transcriptomics, with a view towards a principled approach to comparing transcriptomes between cells, tissues, individuals and species.

The Duke Chapel. While I was inside someone was playing the organ, and as I stared at the ceiling, I could have sworn I was in Europe.