Explaining Regression

During his Daily Show appearance Steve Levitt said that in estimating the effect of abortion on crime he controlled for other variables like police and prisons.  Jon Stewart pressed Steve for an explanation of how someone could "control" for other variables – amazingly, Stewart seemed genuinely interested in an answer but, wisely, Steve demurred.  The exchange got me to thinking, What is the shortest, non-technical, yet reasonably accurate explanation of how this is done?

I think the way to go is to use the Frisch-Waugh-Lovell Theorem.  Here’s my attempt:

Suppose you want to figure out the effect of weight on life expectancy.  Heavy people tend to be tall so you have to control for height.  You can do this with a two-step procedure.  First, calculate how height correlates with weight.  Let’s say that you discover that every 1 inch increase in height above say 5’7 correlates with a 5 pound increase in weight; you now subtract from each person’s weight that portion which can be explained by height.  For example, you would subtract 5 pounds from everyone in the data of height 5’8 and 10 pounds from everyone who is 5’9.   Since height doesn’t explain weight perfectly, you are left with a new variable, weight2.  In the second step, you calculate how life expectancy correlates with weight2, since weight2 is "weight after controlling for height" you have now calculated the effect on life expectancy of weight after controlling for height.

To be clear, I am not suggesting that this is what Steve should have said!  (If asked I would have said, "Well, I could tell you that Jon, but then I would have to bore you.")  I’ve opened the comments section if you have some other ideas.

By the way the ubiquitous Steve Levitt will be here on Wednesday, but I repeat myself.


Comments for this post are closed