When I was an undergraduate at Caltech I took a combinatorics course from Rick Wilson who taught from his then just published textbook A Course in Combinatorics (co-authored with J.H. van Lint). The course and the book emphasized design theory, a subject that is beautiful and fundamental to combinatorics, coding theory, and statistics, but that has sadly been in decline for some time. It was a fantastic course taught by a brilliant professor- an experience that had a profound impact on me. Though to be honest, I haven’t thought much about designs in recent years. Having kids changed that.
A few weeks ago I was playing the card game Colori with my three year old daughter. It’s one of her favorites.
The game consists of 15 cards, each displaying drawings of the same 15 items (beach ball, boat, butterfly, cap, car, drum, duck, fish, flower, kite, pencil, jersey, plane, teapot, teddy bear), with each item colored using two of the colors red, green, yellow and blue. Every pair of cards contains exactly one item that is colored exactly the same. For example, the two cards the teddy bear is holding in the picture above are shown below:
The only pair of items colored exactly the same are the two beach balls. The gameplay consists of shuffling the deck and then placing a pair of cards face-up. Players must find the matching pair, and the first player to do so keeps the cards. This is repeated seven times until there is only one card left in the deck, at which point the player with the most cards wins. When I play with my daughter “winning” consists of enjoying her laughter as she figures out the matching pair, and then proceeds to try to eat one of the cards.
An inspection of all 15 cards provided with the game reveals some interesting structure:
Every card contains exactly one of each type of item. Each item therefore occurs 15 times among the cards, with fourteen of the occurrences consisting of seven matched pairs, plus one extra. This is a type of partially balanced incomplete block design. Ignoring for a moment the extra item placed on each card, what we have is 15 items, each colored one of seven ways (v=15*7=105). The 105 items have been divided into 15 blocks (the cards), so that b=15. Each block contains 14 elements (the items) so that k=14, and each element appears in two blocks (r=2). If every pair of different (colored) items occurred in the same number of cards, we would have a balanced incomplete block design, but this is not the case in Colori. Each item occurs in the same block as 26 (=2*13) other items (we are ignoring the extra item that makes for 15 on each card), and therefore it is not the case that every pair of items occurs in the same number of blocks as would be the case in a balanced incomplete block design. Instead, there is an association scheme that provides extra structure among the 105 items, and in turn describes the way in which items do or do not appear together on cards. The association scheme can be understood as a graph whose nodes consist of the 105 items, with edges between items labeled either 0,1 or 2. An edge between two items of the same type is labeled 0, edges between different items that appear on the same card are labeled 1, and edges between different items that do not appear on the same card are labeled 2. This edge labeling is called an “association scheme” because it has a special property, namely the number of triangles with a base edge labeled k, and other two edges labeled i and j respectively is dependent only on i,j and k and not on the specific base edge selected. In other words, there is a special symmetry to the graph. Returning to the deck of cards, we see that every pair of items appears in the same card exactly 0 or 1 times, and the number depends only on the association class of the pair of objects. This is called a partially balanced incomplete block design.
The author of the game, Reinhard Staupe, made it a bit more difficult by adding an extra item to each card making the identification of the matching pair harder. The addition also ensures that each of the 15 items appears on each card. Moreover, the items are permuted in location on the cards, in an arrangement similar to a latin square, making it hard to pair up the items. And instead of using 8 different colors, he used only four, producing the eight different “colors” of each item on the cards by using pairwise combinations of the four. The yellow-green two-colored beach balls are particularly difficult to tell apart from the green-yellow ones. Of course, much of this is exactly the kind of thing you would want to do if you were designing an RNA-Seq experiment!
Instead of 15 types of items, think of 15 different strains of mice. Instead of colors for the items, think of different cellular conditions to be assayed. Instead of one pair for each of seven color combinations, think of one pair of replicates for each of seven cellular conditions. Instead of cards, think of different sequencing centers that will prepare the libraries and sequence the reads. An ideal experimental setup would involve distributing the replicates and different cellular conditions across the different sequencing centers so as to reduce batch effect. This is the essence of part of the paper Statistical Design and Analysis of RNA Sequencing Data by Paul Auer and Rebecca Doerge. For example, in their Figure 4 (shown below) they illustrate the advantage of balanced block designs to ameliorate lane effects:
Figure 4 from P. Auer and R.W. Doerge’s paper Statistical Design and Analysis of RNA Sequencing Data.
Of course the use of experimental designs for constructing controlled gene expression experiments is not new. Kerr and Churchill wrote about the use of combinatorial designs in Experimental Design for gene expression microarrays, and one can trace back a long chain of ideas originating with R.A. Fisher. But design theory seems to me to be a waning art insofar as molecular biology experiments are concerned, and it is frequently being replaced with biological intuition of what makes for a good control. The design of good controls is sometimes obvious, but not always. So next time you design an experiment, if you have young kids, first play a round of Colori. If the kids are older, play Set instead. And if you don’t have any kids, plan for an extra research project, because what else would you do with your time?
8 comments
Comments feed for this article
February 23, 2015 at 12:47 pm
Alex Fink
This reminds me of a card game Allen Knutson owns and has talked about on g+. The cards are the lines in P^2(F_7), or at least would be if the printers hadn’t omitted three, and the points are little pictures; the objective is to spot the single picture in common on two cards.
February 23, 2015 at 3:51 pm
Lior Pachter
I think you may be talking about Dobble http://eljjdx.canalblog.com/archives/2014/07/06/30181178.html (in French)
February 24, 2015 at 7:13 am
Nicolas
Dobble is called “Spot It” in English http://www.blueorangegames.com/index.php/spot-it
February 24, 2015 at 10:57 am
Neel P
Using combinatorial reasoning to balance or randomize experimental designs was – and still is – very prominent in psychology, psychophysics, and fMRI experiments. I really enjoyed this recent article:
http://nautil.us/issue/6/secret-codes/safecracking-the-brain
(though I doubt de Bruijn sequences will ultimately “safecrack” the brain 🙂 )
February 25, 2015 at 6:37 am
anon
I’m not a mathematician, let alone a professor of Mathematics, but how do you get 8 colors from pairwise combinations of 4 colors? (You write: “And instead of using 8 different colors, he used only four, producing the eight different “colors” of each item on the cards by using pairwise combinations of the four”). I tried to work it out on paper, but i can’t get eight, i only get six (or twelve), not eight. Thank you!
February 25, 2015 at 9:31 pm
Random Variable
Wilson is definitely a great mathematician.
You mentioned applications to coding, stat, and combinatorics. Some of the best applications of designs are in pre-key distribution so do not forget the applications to security!!
February 28, 2015 at 1:03 pm
Curt F.
Design of experiments may be waning in molecular biology but it is thriving in most other fields of empirical science. The widespread use of s software that calculates D optimal designs, whether balanced, blocked, or not, is probably one driver. I’m not a mathematician and so I may be missing something, but I’m a big fan of Brad Jones’ recent designs such as ‘definitive screening’ designs and others. Of course those are for continuous theater than discrete design spaces so it may not be as relevant to combinatorics.
August 21, 2017 at 6:09 pm
Random Variable
Wilson is great. I agree with you that interest in math is declining in new students and there is too little investment