In an essay on frighteningly ambitious startups Paul Graham writes:
…in 2004 Bill Clinton found he was feeling short of breath. Doctors discovered that several of his arteries were over 90% blocked and 3 days later he had a quadruple bypass. It seems reasonable to assume Bill Clinton has the best medical care available. And yet even he had to wait till his arteries were over 90% blocked to learn that the number was over 90%. Surely at some point in the future we’ll know these numbers the way we now know something like our weight. Ditto for cancer. It will seem preposterous to future generations that we wait till patients have physical symptoms to be diagnosed with cancer. Cancer will show up on some sort of radar screen immediately.
An amazing paper in the March 16 issue of Cell illustrates the frontier of what is possible. Geneticist Michael Snyder of Stanford led a team that sequenced his and his mother’s genome. Then, over a two year period they used blood and other assays to track in Snyder’s body transcripts, proteins and metabolites. In the process they generated billions of data points and were able to watch in near real-time what happened as Snyder’s body fought two infections and the surprising onset of diabetes.
From a writeup in Science Daily:
…”We generated 2.67 billion individual reads of the transcriptome, which gave us a degree of analysis that has never been achieved before,” said Snyder. “This enabled us to see some very different processing and editing behaviors that no one had suspected. We also have two copies of each of our genes and we discovered they often behave differently during infection.” Overall, the researchers tracked nearly 20,000 distinct transcripts coding for 12,000 genes and measured the relative levels of more than 6,000 proteins and 1,000 metabolites in Snyder’s blood.
…The researchers identified about 2,000 genes that were expressed at higher levels during infection, including some involved in immune processes and the engulfment of infected cells, and about 2,200 genes that were expressed at lower levels, including some involved in insulin signaling and response.
…The exercise was in stark contrast to the cursory workup most of us receive when we go to the doctor for our regular physical exam. “Currently, we routinely measure fewer than 20 variables in a standard laboratory blood test,” said Snyder, who is also the Stanford W. Ascherman, MD, FACS, Professor in Genetics. “We could, and should, be measuring many, many thousands.”
One side-note: the techniques that the authors use to analyze their time-series data seem (to me) to be behind the curve compared to the VARs used in econometrics. Impulse response functions are what they need! With applications from economics to medicine to marketing, the statistics of big data is the field of the future.