Wow, Justin Wolfers reports on a new NBER paper (ungated) by Trent Alexander, Michael Davern and Betsey Stevenson, that finds big errors in Census data, especially for citizens 65 years and older.
What’s the source of the problem? The Census Bureau purposely messes with the microdata a little, to protect the identity of each individual. For instance, if they recode a 37-year-old expat Aussie living in Philadelphia as a 36-year-old, then it’s harder for you to look me up in the microdata, which protects my privacy. In order to make sure the data still give accurate estimates, it is important that they also recode a 36-year-old with similar characteristics as being 37. This gives you the gist of some of their “disclosure avoidance procedures.” While it may all sound a bit odd, if these procedures are done properly, the data will yield accurate estimates, while also protecting my identity. So far, so good.
But the problem arose because of a programming error in how the Census Bureau ran these procedures. The right response is obvious: fix the programs, and publish corrected data. Unfortunately, the Census Bureau has refused to correct the data.
The problem also runs a bit deeper. If the mistake were just the one shown in the above graph, it would be easy to simply re-scale the estimates so that there are no longer too many, say, 85-year-old men – just weight them down a bit. But it turns out that the same coding error also messes up the correlation between age and employment, or age and marital status (and, the authors suspect, possibly other correlations as well). When you break several correlations like this, there’s no easy statistical fix.
Worse still, the researchers find that related problems afflict the microdata released for other major data sources. All told, they’ve found similar errors in:
- The 2000 Decennial Census.
- The American Community Survey, which is the annual “mini-census” (errors exist in 2003-2006, but not 2001-02, or 2007-08).
- The Current Population Survey, which generates our main labor force statistics (errors exist for 2004-2009).
These microdata have been used in literally thousands of studies and countless policy discussions.















Although there is little chance of that common sense reform being adopted in time for the 2010 Census, the Census Bureau should collect data to adjust for the Census miscount. The Census Bureau has the unique ability to obtain and provide corrected data without the restrictions that limit access to home addresses for incarcerated populations. As a result, the Census Bureau is in the best position to correct its own practice of miscounting prisoners.
E. Barandiaran,
While your comment is mildly topical, I have to complete disagree. As the author has noted, literally thousands and thousands of papers are based on census data. To recall each and everyone would be premature and to call into question every conclusion derived from them would put a massive crater in economic knowledge. We have no idea what the net effect of these data errors were on research. For some estimated paramaters it could strenghten conclusions, for others it may have little to no effect. Still others may end up with coefficients whose sign changes entirely, we simmply don’t know. Obviously the best thing to do is for the Census Bureau to fix the data, but short of that there are better things to do than throwing out a decade worth of research inclduing comparing paramaters estimated in papers during the error ridden period to the same or similar paramaters estimated using data from earlier, unaffected census data or from other sources. Papers with conclusions supported by sound theory are less suspicous. Care should be taken when a paper’s conclusions run counter to established theory or rely on unconventional models or assumptions.
At the end of the day, this isn’t a dealbreaker for every paper wrtiten in the past ten years, more of an asterisk and a reason to look a little deeper.
As a technical matter, E. Barandiaran’s comment sets the stage for a Bayesian nightmare that must follow this revelation, and I cannot understand how wlu2009 reaches such a conclusion.
If we (completely illogically, but typically) assume that every hypothesis test appearing in every published paper was the pure result of a single dive into the data without any “data mining”, (as any preliminary results of different models in theory would require modifications to the hypothesis test,) we’re now facing a situation where we would be required most certainly to re-test all of the hypotheses with the understanding that the model has been corrupted by the first set of tests, based on flawed data.
I would like to see an official statement from, for example, the editorial board of Econometrica explaining how they intend to handle this!
Thbey should have hired some of the Global Warming programmers – they are so faultless that they produce only settled science.
Just to be clear, from the paper:
“Until a solution is devised, researchers should not use the affected samples to conduct analyses that assume a representative sample of the population by age and sex for people
ages 65 and older. For any analysis relying on the age variable, we would recommend treating those aged 65 and older as a single analytic category (making no differentiation between men and women), or eliminating those ages 65 and up from the analysis.”
Not all of the data, and therefore, work based on the census is suspect or at least yet.
Is this the source of the quote from Wolfers?
http://freakonomics.blogs.nytimes.com/2010/02/02/can-you-trust-census-data/
This is breaking news. Headline of http://www.nytimes.com/2010/02/04/business/global/04toyota.html?hp
Stop Driving Recalled Toyotas, Agency Chief Says
I hope soon to read this headline in the NYT:
Stop Using Recalled Census Data, Agency Chief Says?
At what point does the data become more valuable by not having it?
I guess it depends on the sensitivity of the math and analysis it is being used for. Could the break-even point be calculated? The cost of collecting it is a definite number. There are also the costs of the government having this information that they have to ‘privatize’ due to potentially nefarious use by anyone not in the government (though I’ll be in a minority worrying about that).
Gives new relevance to the phrase “close enough for government work.”
Recently, I found the 2010 Census form hanging on my door. As I began filling it out, I came across a dilemma. The U.S. government wants to know if my children are adopted or not and it wants to know what our races are. Being adopted myself, I had to put “Other† and “Don’t Know Adopted† for my race and “Other† and “Don’t Know† for my kids’ races.
Can you imagine not knowing your ethnicity, your race? Now imagine walking into a vital records office and asking the clerk for your original birth certificate only to be told “No, you can’t have it, it’s sealed.†
How about being presented with a “family history form† to fill out at every single doctor’s office visit and having to put “N/A Adopted† where life saving information should be?
Imagine being asked what your nationality is and having to respond with “I don’t know†.
It is time that the archaic practice of sealing and altering birth certificates of adopted persons stops.
Adoption is a 5 billion dollar, unregulated industry that profits from the sale and redistribution of children. It turns children into chattel who are re-labeled and sold as “blank slates†.
Genealogy, a modern-day fascination, cannot be enjoyed by adopted persons with sealed identities. Family trees are exclusive to the non-adopted persons in our society.
If adoption is truly to return to what is best for a child, then the rights of children to their biological identities should NEVER be violated. Every single judge that finalizes an adoption and orders a child’s birth certificate to be sealed should be ashamed of him/herself.
I challenge all readers: Ask the adopted persons that you know if their original birth certificates are sealed.
gucci shoes wholesale
real cheap clothing
brand new handbag
discount ed hardy jeans
cheap gucci jackets
Comments on this entry are closed.