One of my favorite systems biology papers is the classic “Stochastic Gene Expression in a Single Cell” by Michael Elowitz, Arnold J. Levine, Eric D. Siggia and Peter S. Swain (in this post I refer to it as the ELSS paper).

What I particularly like about the paper is that it resulted from computational biology flipped. Unlike most genomics projects, where statistics and computation are servants of the data, in ELSS a statistical idea was implemented with biology, and the result was an elegant experiment that enabled a fundamental measurement to be made.

The fact the ELSS implemented a statistical idea with biology makes the statistics a natural starting point for understanding the paper. The statistical idea is what is known as *the law of total variance*. Given a random (response) variable with a random covariate , the law of total variance states that the variance of can be decomposed as:

.

There is a biological interpretation of this law that also serves to explain it: If the random variable denotes the expression of a gene in a single cell ( being a random variable means that the expression is stochastic), and denotes the (random) state of a cell, then the law of total variance describes the “total noise” in terms of what can be considered “intrinsic” (also called “unexplained”) and “extrinsic” (also called “explained”) noise or variance.

To understand intrinsic noise, first one understands the expression to be the conditional variance, which is also a random variable; its possible values are the *variance* of the gene expression in different cell states. If does not depend on then the expression of the gene is said to be *homoscedastic*, i.e., it does not depend on cell state (if it does then it is said to be *heteroscedastic*). Because is a random variable, the expression makes sense, it is simply the average variance (of the gene expression in single cells) across cell states (weighted by their probability of occurrence), thus the term “intrinsic noise” to describe it.

The expression is a random variable whose possible values are the average of the gene expression in cells. Thus, is the variance of the averages; intuitively it can be understood to describe the noise arising from different cell state, thus the term “extrinsic noise” to describe it (see here for a useful interactive visualization for exploring the law of total variance).

The idea of ELSS was to design an experiment to measure the extent of intrinsic and extrinsic noise in gene expression by inserting two identically regulated reporter genes (cyan fluorescent protein and yellow fluorescent protein) into *E. coli *and measuring their expression in different cells. What this provides are measurements from the following model:

Random cell states are represented by random variables which are independent and identically distributed, one for each of *n* different cells, while random variables and correspond to the gene expression of the cyan , respectively yellow, reporters in those cells. The ELSS experiment produces a single sample from each variable and , i.e. a pair of measurements for each cell*. *A hierarchical model for the experiment, in which the marginal (unconditional) distributions and are identical, allows for estimating the intrinsic and extrinsic noise from the reporter expression measurements.

The model above, on which ELSS is based, was not described in the original paper (more on that later). Instead, in ELSS the following estimates for intrinsic, extrinsic and total noise were simply written down:

(intrinsic noise)

(extrinsic noise)

(total noise)

Here and are the measurements of cyan respectively yellow reporter expression in each cell, and .

Last year, Audrey Fu, at the time a visiting scholar in Jonathan Pritchard’s lab and now assistant professor in statistical science at the University of Idaho, studied the ELSS paper as part of a journal club. She noticed some inconsistencies with the proposed estimates in the paper, e.g. it seemed to her that some were biased, whereas others were not, and she proceeded to investigate in more detail the statistical basis for the estimates. There had been a few papers trying to provide statistical background, motivation and interpretation for the formulas in ELSS (e.g. A. Hilfinger and J. Paulsson, Separating intrinsic from extrinsic fluctuations in dynamic biological systems, 2011 ), but there had not been an analysis of bias, or for that matter other statistical properties of the estimates. Audrey started to embark on a post-publication peer review of the paper, and having seen such reviews on my blog contacted me to ask whether I would be interested to work with her. The project has been a fun hobby of ours for the past couple of months, eventually leading to a manuscript that we just posted on the arXiv:

Audrey Fu and Lior Pachter, Estimating intrinsic and extrinsic noise from single-cell gene expression measurements, arXiv 2016.

Our work provides what I think of as a “statistical companion” to the ELSS paper. First, we describe a formal hierarchical model which provides a useful starting point for thinking about estimators for intrinsic and extrinsic noise. With the model we re-examine the ELSS formulas, derive alternative estimators that either minimize bias or mean squared error, and revisit the intuition that underlies the extraction of intrinsic and extrinsic noise from data. Details are in the paper, but I briefly review some of the highlights here:

Figure 3a of the ELSS paper shows a scatterplot of data from two experiments, and provides a geometric interpretation of intrinsic and extrinsic noise that can guide intuition about them. We have redrawn their figure (albeit with a handful of points rather than with real data) in order to clarify the connections to the statistics:

The Elowitz *et al. *caption correctly stated that “Each point represents the mean fluorescence intensities from one cell. Spread of points perpendicular to the diagonal line on which CFP and YFP intensities are equal corresponds to intrinsic noise, whereas spread parallel to this line is increased by extrinsic noise”. While both statements are true, the one about intrinsic noise is precise whereas the one about extrinsic noise can be refined. In fact, the ELSS extrinsic noise estimate is the sample covariance (albeit biased due to a prefix of *n* in the denominator rather than *n-1*), an observation made by Hilfinger and Paulsson. The sample covariance has a (well-known) geometric interpretation: Specifically, we explain that it is the average (signed) area of triangles formed by pairs of data points (one the blue one in the figure): green triangles in Q1 and Q3 (some not shown) represent a positive contribution to the covariance and magenta triangles in Q2 and Q4 represent a negative contribution. Since most data points lie in the 1st (Q1) and 3rd (Q3) quadrants relative to the blue point, most of the contribution involving the blue point is positive. Similarly, since most pairs of data points can be connected by a positively signed line, their positive contribution will result in a positive covariance. We also explain why naïve intuition of extrinsic noise as the variance of points along the line is problematic.

The estimators we derive are summarized in the table below (Table 1 from our paper):

There is a bit of algebra that is required to derive formulas in the table (see the appendices of our paper). The take home messages are that:

- There is a subtle assumption underlying the ELSS intrinsic noise estimator that makes sense for the experiments in the ELSS paper, but not for every type of experiment in which the ELSS formulas are currently used. This has to do with the mean expression level of the two reporters, and we provide a rationale and criteria when to apply quantile normalization to normalize expression to the data.
- The ELSS intrinsic noise estimator is unbiased, whereas the ELSS extrinsic noise estimator is (slightly) biased. This asymmetry can be easily rectified with adjustments we derive.
- The standard unbiased estimator for variance (obtained via the Bessel correction) is frequently, and correctly, criticized for trading off mean squared error for bias. In practice, it can be more important to minimize mean squared error (MSE). For this reason we derive MSE minimizing estimators. While the MSE minimizing estimates converge quickly to the unbiased estimates (as a function of the number of cells), we believe there may be applications of the law of total variance to problems in systems biology where sample sizes are smaller, in which case our formulas may become useful.

The ELSS paper has been cited more than 3,000 times and with the recent emergence of large scale single-cell RNA-Seq the ideas in the paper are more relevant than ever. Hopefully our statistical companion to the paper will be useful to those seeking to obtain a better understanding of the methods, or those who plan to extend and apply them.

## 6 comments

Comments feed for this article

January 18, 2016 at 6:03 pm

Arjun RajGreat post on one of my favorite papers! Looking forward to going through this in some detail.

One experimental point that many have noted in the subsequent years is that it’s often quite hard to really ensure that the two reporters are well and truly identical–indeed, there are often differences in degradation rate, photophysical properties, etc. An alternative approach is to measure how total variability increases upon adding another copy of the *exact* same reporter. E.g., compare one copy of GFP in the cell to two copies. If variability is intrinsic, the variance goes up by a factor of two; if extrinsic, by a factor of four. I have not done those sort of experiments, but those that have feel more confident in that approach. Curious about what the statistical properties of such an experiment would be.

January 18, 2016 at 6:31 pm

Lior PachterVery interesting idea! We’ll think about it.

January 18, 2016 at 6:51 pm

Arjun RajHere is a nice recent paper on the one/two copy noise measurement from Marc Sherman and Barak Cohen (Cell Systems 2015): http://www.sciencedirect.com/science/article/pii/S2405471215001854

and here’s to my knowledge the first paper to use this approach (Volfson… Hasty, Nature 2005): http://www.nature.com/nature/journal/v439/n7078/full/nature04281.html

January 22, 2016 at 7:46 pm

Audrey FuThanks for this very interesting comment. If I understand the experiment in Volfson et al. (2006) correctly, cells in strain A each have a single copy, and cells in strain B each have two copies. I think that this setup and the following calculation of intrinsic and extrinsic noise assume that the variance for each single copy in strain B would be identical to that in strain A. While this makes sense theoretically, in reality the two strains are effectively two batches, and batch effect is likely non negligible.

Assuming no batch (strain) effect, then the estimation of intrinsic and extrinsic noise in Volfson et al. (2006) is actually the same as in Elowitz et al (2002). The difference is that the 2nd reporter in the two-reporter experiment is replaced with the 2nd copy in the copy number experiment. This means that, similar to the scatterplot of CFP versus YFP, one can plot the scatterplot of strain A versus (strain B – strain A) for the copy number experiment. Whereas extrinsic noise is the covariance between the two reporters in the two-reporter experiment, extrinsic noise is the covariance between the two copies in the copy number experiment.

Formulas in the supplement (sec. II on p 1) to the Volfson paper are consistent with the description above. In their notation, V1 and V2 are the variance in the 1-copy and 2-copy strains, respectively. Vi and Ve are intrinsic and extrinsic noise, respectively. Their formulas are:

Vi = 2V1 – V2 / 2; (1)

Ve = V2 / 2 – V1. (2)

To see why these formulas are used, we can again apply the hierarchical model presented in the blog post, replacing C and Y with C1 and C2 (assuming only the, say, CFP reporter is used in the copy number experiment). Then in the 1-copy strain,

V1 = Var[C1] = Vi + Ve = Var[C2]. (3)

In the 2-copy strain,

V2 = Var[C1 + C2] = Var[C1] + Var[C2] + 2Cov[C1,C2] = 2(Vi + Ve) + 2Ve. (4)

Together, (3) and (4) give rise to (1) and (2).

The other paper you mentioned in the comment, Sherman and Cohen (2015), explicitly stated that the covariance between the two copies is the extrinsic noise (equations S19 and S20 in S2.5.1 of the supplement). However, they saw large variation with the results from these formulas (they also mentioned that this experience was consistent with Steward-Ornstein et al. 2012, Molecular Cell. I haven’t looked into this ref yet). They went on to derive another approach for better estimation. I think that the large variation Sherman and Cohen saw with formulas (3) and (4) is possibly indication of the non negligible batch (strain) effect.

January 22, 2016 at 9:30 pm

Arjun RajInteresting! I would not at all be surprised if there is a “batch” effect (I usually just call it “cell lines are just different” 🙂 ), and that’s certainly an issue. I wonder if it’s a bigger issue than the differences between CFP and YFP in the two-color ELSS-style experiments (which would lead to artificially higher intrinsic noise). I haven’t done any of these comparisons myself experimentally, just relaying what I’ve heard from others.

April 21, 2016 at 12:38 pm

NimwegenLab (@NimwegenLab)While visiting Berkeley Lior showed me this work and I had some fun working out the Bayesian solution to this problem. I think the differences highly educational in demonstrating why Bayesian methods are superior and decided to write the analysis up and put it on BioRxiv:

Inferring intrinsic and extrinsic noise from a dual fluorescent reporter

http://biorxiv.org/content/early/2016/04/21/049486

Hope some of you will find it useful.