At the Frontier of Personalized Medicine

In an essay on frighteningly ambitious startups Paul Graham writes:

…in 2004 Bill Clinton found he was feeling short of breath. Doctors discovered that several of his arteries were over 90% blocked and 3 days later he had a quadruple bypass. It seems reasonable to assume Bill Clinton has the best medical care available. And yet even he had to wait till his arteries were over 90% blocked to learn that the number was over 90%. Surely at some point in the future we’ll know these numbers the way we now know something like our weight. Ditto for cancer. It will seem preposterous to future generations that we wait till patients have physical symptoms to be diagnosed with cancer. Cancer will show up on some sort of radar screen immediately.

An amazing paper in the March 16 issue of Cell illustrates the frontier of what is possible. Geneticist Michael Snyder of Stanford led a team that sequenced his and his mother’s genome. Then, over a two year period they used blood and other assays to track in Snyder’s body transcripts, proteins and metabolites. In the process they generated billions of data points and were able to watch in near real-time what happened as Snyder’s body fought two infections and the surprising onset of diabetes.

From a writeup in Science Daily:

…”We generated 2.67 billion individual reads of the transcriptome, which gave us a degree of analysis that has never been achieved before,” said Snyder. “This enabled us to see some very different processing and editing behaviors that no one had suspected. We also have two copies of each of our genes and we discovered they often behave differently during infection.” Overall, the researchers tracked nearly 20,000 distinct transcripts coding for 12,000 genes and measured the relative levels of more than 6,000 proteins and 1,000 metabolites in Snyder’s blood.

…The researchers identified about 2,000 genes that were expressed at higher levels during infection, including some involved in immune processes and the engulfment of infected cells, and about 2,200 genes that were expressed at lower levels, including some involved in insulin signaling and response.

…The exercise was in stark contrast to the cursory workup most of us receive when we go to the doctor for our regular physical exam. “Currently, we routinely measure fewer than 20 variables in a standard laboratory blood test,” said Snyder, who is also the Stanford W. Ascherman, MD, FACS, Professor in Genetics. “We could, and should, be measuring many, many thousands.”

One side-note: the techniques that the authors use to analyze their time-series data seem (to me) to be behind the curve compared to the VARs used in econometrics. Impulse response functions are what they need! With applications from economics to medicine to marketing, the statistics of big data is the field of the future.

Addendum: Derek Lowe offers further thoughts and Andrew S. points us to this TED video on blood tests without needles.


Impulse response may be a good tool in the future, but right now most research is just trying to figure out the different pathways. knockout a gene (disable) and see what happens and such. There just is not enough information on the cellular responses to create good models, regardless of what computational biologists might say.

I have no idea what economics tools are, but there are all these data output streams (e.g. urine) and a lot of things will correlate to diseases. So, imagine a dongle called the USPee. You get the idea.

My understanding is that the technology is already available for cell phone use. The problem with modern medicine, as you note, is that it is only a snapshot in time after an event happens. Current technology is available that allows your cell phone to track vitals and submit them at regular intervals. You now have a time series of data that can find anomalies before an event happens.

I currently have a student with a pacemaker that transmits information on her heart rate, bpm, etc. to her doctor who can then spot an anomaly and call her in immediately. It's already been used a few times.

Similar technology exists for patients with diabetes; currently the technology exists so that people who are treated with an insulin pump: a) get their blood glucose measured continuously, b) the pump stops the insulin for two hours automatically if the blood glucose reaches a lower limit, c) the pump can call out an alarm if the blood glucose is too low.

There are problems with the sensors, which are not yet as precise as patients would like (so some patients might become unresponsive before the alarm goes off), but this is a huge step forward compared to a few decades ago. Note that this technology completely skips 'the middle man', the doctor. Here's a Danish link, in case anyone cares (google translate is your friend, I guess, but it doesn't contain much information I haven't mentioned already):

Clarification: a), b) and c) do not all apply to all diabetics who are treated with a pump (far from it), but the technology exists. According to the article, in Slovenia the sensor (which measures blood glucose continuously) is paid for by the government in the case of all children at the age of 0-7.

Imagine what the average individual who does not understand correlation, confidence intervals, etc. when they are given a report that indicates they "may" develop cancer or some other disease.

I think we should invest in anxiety medicines. We're going to have a bunch of emotionally paralyzed individuals, or ones which will take more risk than they would have otherwise because they "may" die soon.

Or, the person who may be developing cancer will know he's also not developing diabetes and can adjust accordingly.

I'd rather have too much info than not enough. I suspect most people would agree with this. And if for some reason they don't want the info then don't use the device that measures this. Problem solved.

I hate to break it to you, but you already have cancer. And blocked ateries. And a thousand other problems.

The point is your body can keep those systems in check, until it gets older, and then it can't. No test is going to change that.

No you don't have those things in any sense. Those things are the macro damage. The beginnings of those things can be addressed earlier than we do. Metastatic cancer cannot be held in check, it has to be stopped before it starts. You are right about aging being an ultimate contributor to the inability of the body to accommodate damage. However, aging can be treated as a separate thing.

If, by using my arsenal of sophisticated techniques, I did inform a 20 year old that he has a nascent proto-blockage is it going to have a substantial influence on his lifestyle?

If knowledge was the issue we wouldn't have any smokers. The link and likelihood of smoking leading to death is far clearer than any proto-blockage.

There is a huge range outcomes from smoking based on genetics and it is also highly addictive. If you told someone they definitely had the genetic makeup to make smoking quickly deadly that even makes smoking different than it is today. If you told someone all they had to do was take an aspirin a day, that's much easier than quitting smoking.

Does the gas mileage thing you stick on your car change people's driving habits? I suspect it does.


We know how to extend life. Exercise, don't smoke and don't get fat. Bam! You'll live a healthy life until 75 or 80, your genetics will take over, and the last 10 years will be bad.

Avoding the doctor after 40 -- rather than more data -- is the key to a healthy life.

Those things are costs like everything else. The people at the gym like moving and they clog up the joint. Knowing how much exercise you need if you are not a mover will reduce the cost of exercise and increase the likelihood of doing it. What you are basically assuming is that noone gets enough exercise and "more" is always the answer.


What percentage of the American population do you think gets enough exercise?

"Lifestyle" is really a collection of habits -- good and bad. Good habits can be hard to instill and bad ones hard to break. There's been a lot of research recently on how to be more successful, such as:

1. Only try to change one habit at a time, and give yourself at least 30 days to do so
2. Tie the habit to a reward (smoking gives you the 8-10 rewards on every cigarette in the form of a slight high or at least anxiety reduction)
3. Make the habit specific and measurable. "Exercise more" is not a habit. "Go to the gym every Tuesday, Thursday and Saturday" is.

My point? Information about health risks is not enough. Most people have poorly developed tools for instilling good habits. Until we address that, lifestyle change on a mass scale will not happen.

Knowledge of the link between smoking and lung cancer hasn't led to the end of smoking, but it has reduced the number of smokers. I for one would have probably smoked cigarettes 40 years ago, but I was a teenager in the 90s, not the 70s, and because I was well-aware of the risks associated with smoking, I didn't smoke. Who cares if not everyone will heed early medical warnings? Many people will, even many dumb 20-year-olds.

It's funny how smoking has attained this separate status even though it's like almost everything else. Each cigarette marginally affects your health on a molecular and cellular basis and over the course of 20-100 years of doing it, depending on your (epi)genetics causes gross problems.

I find it comical that on one end, we have mind-bogglingly sophisticated modern medicine and on the other, we have a large portion of the population that doesn't seem to have either the will or the knowledge to take basic care of their own health. Sure, genetic and real time testing as well as econometric analysis can help us track the onset of someone's diabetes, but should we really allocate that level of resources to someone if they eat terribly and never exercise? In Bill's case he completely changed his diet and exercise habits, but I can't see that being the norm. This is obviously more of an issue when it comes to treatment than it is to early diagnosis.

I wonder how much this whole diagnostic excercise cost him? And how does it compare to the current median spending on healthcare?

The cost of tracking even a large number of variables is quite low given the right software. I use an app called KeepTrack for Android and Stata on OSX for data analysis.

Not the analysis; but what about the sensing and measurement?

With all those time series, they're probably going to need some of the large-dimensional versions, like a factor-augmented VAR

Those who get excited by every published medical correlation will presumably start screaming at the clear evidence that having your genome done is a risk factor for diabetes. The more fruitcake of them will wonder about the clinical significance of having your mother's genome done too.

What are the benefits of VAR over the estimation methods used in the study? More efficient, less bias, consistent?

Maybe better out of sample predictions.

Oh, if only every branch of "science" could be as well-grounded and statistically robust as economics...

This sounds like taking the problem of prostate screening, and making it a million times worse.

PG talks about Clinton nearly having a heart attack, and thinks "what if we had found it sooner?" But there are thousands of things that can go wrong with someone. You would have to screen for all those thousand of things, and probably have dozens of false positives along the way.

I don't think overcoming this is 50 or 100 years off. Maybe 200 years off.

A sure way to make health care 99% of GDP!

I am disappointed, as commenter #21, in both the original poster and this esteemed audience, to have bring up suc elementary statistics, as Bonferroni correction, false positives, ROC analysis.
"“Currently, we routinely measure fewer than 20 variables in a standard laboratory blood test,” said Snyder, who is also the Stanford W. Ascherman, MD, FACS, Professor in Genetics. “We could, and should, be measuring many, many thousands.”
at the standard 95% confidence level for most lab tests, if physicians did this a normal person would have dozens of abnormal results, each of which would have to be evaluated at enormous cost (CT) or risk (angiography) to rule out true positivity.

There is plenty of evidence from RCTs that stenosis below 70% is usually best left untreated. Knowing whether your average stenois is 55% or 75% would have no impact on your treatment. American Heart Association guidelines for CABG are a good start for those interested.

really, the problem is physician productivity and the quality of diagnostic data, not the amount. Please do not make us look at more so-so data.


FYI, I've been to one of his lectures here, and they do correct for that. If you're interested in this stuff, I'd also recommend Euan Ashley's work, also at Stanford School of Medicine.


You are thinking in terms of the current thresholds of physician treatment. What if the treatment was, using the example of getting diabetes in the article, lay off the sugar and don't get RSV this month?

Because, when the so-so data is dumped into the computer it won't have to be very precise. The data from the single individual generated the hypothesis that RSV infection triggered his diabetes.

exactly how do you avoid RSV? I've been trying for years.

the diabetes (questionable diagnosis) was only related to measuring this glucose level, not some fancy-ass test.


You have some baseline probability of developing a disease, say 5%. According to one approach, you have characteristics X Y Z and this means your probability is actually much higher, more like 12%. Now what?

But wait, according to another approach, you have characteristics A B C and this means your probability is actually between 3% and 6%. Now what?

OK, some brilliant economist has figured out some way of combining these approaches, his best guess is that the real probability is 7%, with some wide confidence interval around that.

Is it obvious what you're supposed to do yet? Should we collect more data?

The answer is obvious. Prescribe many strong drugs.

Many people are assuming this model where the patient shows up at the doctor's office with hat in hand and gratefully accepts whatever indignity and treatment the doctor mandates. First of all, that's not the general case. Secondly, the technologies we are talking about here represent another another brick taken out of that wall. Third, we are also talking about the birth of algorithmic medicine where no judgment or mandates are required by the doctor and the patient retains all veto power.

If only the Soviets had this amount of information they could have managed their economy better.

In other words we have a giant new haystack to search for more needles.

These advances may have their niche role. The central problem of healthcare is not that we don't have good diagnostics; it is how to get the current state-of-the-art to the people.

no, the problem is health care is 80% of your lifetime medical expenses come at the last six months of life.

You are both right, and also there's more: over-prescription and testing for income and liability reasons, the aging demographic bulge, and many other problems.

The problem of healthcare is that there are MANY problems of health care. It's the one thing out there I find to be 'unsolvable'.

Funny how no one has mentioned that it is advances such as this that drives medical costs. Yes, running a shotgun array of tests may help you get to the answer, but at the expense of losing familiarity with using cheaper and older tests, just as the current generation of physicians being trained have never learned most minutia of physical exam that is even cheaper than ordering basic labs+imaging.

At which point do people actually say they're willing to spend a lot less for a smaller marginal decrease in correct diagnosis?

what is it about DNA that brings out the popular science loving we will have flying cars kid in othewise sober, grownup, PhD s ?
I notice the same thing on
I do DNA for a living; we are a loooong way from making this practical for someone with out millions of dollars (if you think I'm exaggerating, try buying some microarray slides or hiring a bio informaticist)

To my mind, if you have a person with an infection or arterial problems, the need to measure 1,000 of genes or transcripts (splice variants ?) or proteins (phospho variants anyone ?) or metabolites implys that we don't understand what is going on; biology doesn't work like that - if you understand the system, you can get by with a lot less data.

A few years ago, I was at a microarray conference in Boston, and a Scientist from, I think, Abbott, gave a talk on chemicals that cause liver dis ease (like chloroform). So, they fed some rats some CHCl3, or not, and did microarrays, looking at thousands of genes; upshot, all of he info could be obtained from 40 genes.
Now, maybe if you were looking at people, who are more genetically and environmentally heterogeneous, and who can tell you how they feel in detail, you might need more genes, but I doubt it (now adays, I think you would do a multiplex PCR, either in an old fashioned plate, or maybe with a fluidigmn assay)

PS : the idea that we are gonna do non invasive blood test in the near future is a NON STARTER; just the sampling issues alone (if you have a drop, how do you know the drop is representative of the rest of the blood) kill this dead; it is sort of like Quiggens zombie economics; long as I've been in diagnostics, people keep talking about it...ain't happening.

Or at least, I don't think you can measure viral RNA, or IL-1beta,or LDL/HDL with a non invasive test; I don't care how good you multidimensional NMR/non coherent absorbance spectroscopy/ICP-MS is.

Comments for this post are closed