The International Society for Clinical Densitometry has an official position on the number of decimal digits that should be reported when describing the results of bone density scans:

  • BMD (bone mineral
    density): three digits (e.g., 0.927 g/cm2). •
  • T-score: one digit (e.g., –2.3).
  •  Z-score: one digit (e.g., 1.7).
  •  BMC (bone mineral content): two digits (e.g., 31.76
    g).
  •  Area: two digits (e.g., 43.25
    cm2).
  •  % reference database: Integer
    (e.g., 82%).

Are these recommendations reasonable? Maybe not. For example they fly in the face of the recommendation in the “seminal” work of Ehrenberg (Journal of the Royal Statistical Society A, 1977)  which is to use two decimal digits.

Two? Three? What should it be? This what my Math10 students always ask of me.

I answered this question for my freshmen in Math10 two weeks ago by using an example based on a dataset from the paper Schwartz, M. “A biomathematical approach to clinical tumor growth“. Cancer 14 (1961): 1272-1294. The paper has a dataset consisting of the size of a pulmonary neoplasm over time:

TumorA simple model for the tumor growth is f(t) = a \cdot b^t and in class I showed how a surprisingly good fit can be obtained by interpolating through only two points (t=0 and t=208):

f(0)= 1.8 \Rightarrow a \cdot b^0 = 1.8 \Rightarrow a = 1.8.

Then we have that f(208) = 3.5 \Rightarrow 1.8 \cdot b^{208} = 3.5 \Rightarrow b= \sqrt{208}{3.5/1.8} \approx 1.00302.

The power function f(t)=1.8 \cdot 1.00302^t is shown in the figure. The fit is surprisingly good considering it is based on an interpolation using only two points. The point of the example is that if one rounds the answer 1.00302 to two decimal digits then one obtains  f(t) = 1.8 \cdot 1^t which is very far from super-linear. In other words, a small (quantitative) change in the assumptions (restricting the rate to intervals differing by 0.01) results in a major qualitative change in results:  with two decimal digits the patient lives, with three… death!

This simple example of decimal digit arithmetic illustrates a pitfall affecting many computational biology studies. It is tempting to believe that \mbox{Qualitative} \subset \mbox{Quantitative}, i.e. that focusing on qualitative analysis allows for the flexibility of ignoring quantitative assumptions.  However frequently the qualitative devil is in the quantitative details.

One field where qualitative results are prevalent, and therefore the devil strikes frequently, is network theory. The emphasis on searching for “universal phenomena”, i.e. qualitative results applicable to networks arising in different contexts, arguably originates with Milgram’s small world experiment that led to the concept of “six-degree of separation” and Watts and Strogatz’s theory of collective dynamics in small-world networks (my friend Peter Dodds replicated Milgram’s original experiment using email in “An experimental study of search in global social networks“, Science 301 (2003), p 827–829) . In mathematics these ideas have been popularized via the Erdös number which is the distance between an author and Paul Erdös in a graph where two individuals are connected by an edge if they have published together. My Erdös number is 2, a fact that is of interest only in that it divulges my combinatorics roots. I’m prouder of other connections to researchers that write excellent papers on topics of current interest. For example, I’m pleased to be distance 2 away from Carl Bergstrom via my former undergraduate student Frazer Meacham (currently one of Carl’s Ph.D. students) and the papers:

  1. Meacham, Frazer, Dario Boffelli, Joseph Dhahbi, David IK Martin, Meromit Singer, and Lior Pachter. “Identification and Correction of Systematic Error in High-throughput Sequence Data.” BMC Bioinformatics 12, no. 1 (November 21, 2011): 451. doi:10.1186/1471-2105-12-451.
  2. Meacham, Frazer, Aaron Perlmutter, and Carl T. Bergstrom. “Honest Signaling with Costly Gambles.” Journal of the Royal Society Interface 10, no. 87 (October 6, 2013):20130469. dpi:10.1098/rsif.2013.0469.
One of Bergstrom’s insightful papers where he exposes the devil (in the quantitative details) is “Nodal Dynamics, Not Degree Distributions, Determine the Structural Controllability of Complex Networks”  by Cowan et al., PLoS One 7 (2012), e88398. It describes a not-so-subtle example of an unreasonable quantitative assumption that leads to intuition about network structural controllability that is, to be blunt, false. The example Carl critiques is from the paper

Controllability of complex networks” by Yang-Yu Liu, Jean-Jacques Slotine and Albert-László Barabási, Nature 473 (2011), p 167–173. The mathematics is straightforward: It concerns the dynamics of linear systems of the form

\frac{d{\bf x}(t)}{dt} =-p{\bf x}(t) + A{\bf x}(t) + B{\bf u}(t) .

The dynamics can be viewed as taking place on a graph whose adjacency matrix is given by the non-zero entries of A (an nxn matrix). The vector -p (of size nis called the pole of the linear system and describes intrinsic dynamics at the nodes. The vector (of size mcorresponds to external inputs that are coupled to the system via the nxm matrix B.

An immediate observation is that the vector p is unnecessary and can be incorporated into the diagonal of the matrix A. An element on the diagonal of A that is then non-zero can be considered to be a self-loop. The system then becomes

\frac{d{\bf x}(t)}{dt} =A{\bf x}(t) + B{\bf u}(t)

which is the form considered in the Liu et al. paper (their equation(1)). The system is controllable if there are time-dependent u that can drive the system from any initial state to a  target end state. Mathematically, this is equivalent to asking whether the matrix C=(B,AB,A^2B,\ldots, A^{n-1}B) has full rank (a classic result known as Kalman’s criteria of controllability). Structural controllability is a weaker requirement, in which the question is  whether given only adjacency matrices and B, there exists weights for edges so that the weighted adjacency matrices satisfy Kalman’s criteria. The point of structural controllability is a theorem showing that structurally controllable systems are generically controllable.

The Liu et al. paper makes two points: the first is that if M is the size of a maximum matching in a given nxn adjacency matrix A, then the minimum m for which there exists a matrix B of size nxm for which the system is structurally controllable is m=n-M+1 (turns out this first point had already been made, namely in a paper by Commault et al. from 2002). The second point is that m is related to the degree distribution of the graph A. 

The point of the Cowan et al. paper is to explain that the results of Liu et al. are completely uninteresting if a_{ii}  is non-zero for every i. This is because M is then equal to n (the matching matching of A consists of every self-loop). And therefore the result of Liu et al. reduces to the statement that m=1, or equivalently, that structural controllability for real-world networks can also be achieved with a single control input.

Unfortunately for Liu and coauthors, barring pathological canceling out of intrinsic dynamics with self-regulation, the diagonal elements of A, a_{ii} will be zero only if there are no intrinsic dynamics to the system (equivalently p_i=0 or the time constants \frac{1}{p_i} are infinite).  I repeat the obvious by quoting from Cowan et al.:

“However, infinite time constants at each node do not generally reflect the dynamics of the physical and biological systems in Table 1 [of Liu et al.]. Reproduction and mortality schedules imply species- specific time constants in trophic networks. Molecular products spontaneously degrade at different rates in protein interaction networks and gene regulatory networks. Absent synaptic input, neuronal activity returns to baseline at cell-specific rates. Indeed, most if not all systems in physics, biology, chemistry, ecology, and engineering will have a linearization with a finite time constant. ”

Cowan et al. go a bit further than simply refuting the premise and results of Liu et al. They avoid the naïve reduction of a system with intrinsic dynamics to one with self-loops, and provide a specific criterion for the number of nodes in the graph that must be controlled.

In summary, just as with the rounding of decimal digits, a (simple looking) assumption of Liu et al., namely that p=0 completely changes the qualitative nature of the result. Moreover, it renders false the thesis of the Liu et al. paper, namely that degree distributions in (real) networks affect the requirements for controllability.

Oops.