How economics has changed

Panel A illustrates a virtually linear rise in the fraction of papers, in both the NBER and top-five series, which make explicit reference to identification.  This fraction has risen from around 4 percent to 50 percent of papers.


Currently, over 40 percent of NBER papers and about 35 percent of top-five papers make reference to randomized controlled trials (RCTs), lab experiments, difference-in-differences, regression discontinuity, event studies, or bunching…The term Big Data suddenly sky-rockets after 2012, with a more recent uptick in the top five.

Note that about one-quarter of NBER working papers in applied micro make references to difference-in differences. And:

The importance of figures relative to tables has increased substantially over time…

And about five percent of top five papers were RCTs in 2019.  Note also that “structural models” have been on the decline in Labor Economics, but on the rise in Public Economics and Industrial Organization.

That is all from a recent paper by Janet Currie, Henrik Kleven, and Esmee Zwiers, “Technology and Big Data are Changing Economics: Mining Text to Track Methods.”

Via Ilya Novak.


The increase in references to identification makes sense; it's pretty much an endemic question at seminars or by referees. A convenient shorthand -- but thus also jargon -- for a critique based on someone questioning the researcher's interpretation of the data.

Well, statistically it can be hard to prove whether the sun rises due to the cock crowing or vice versa (Bayes theorem). And for difference-in-differences, here's a good book on my bookshelf (not read all of it yet):
Natural Experiments of History – April 15, 2011 - by Jared Diamond (Editor), James A. Robinson (Editor)

Bonus trivia: from history, all 'frontiers' like the Wild West, Argentinian Pampas, USSR wastelands, Brazilian Amazon, etc, go through the same cycle of boom-and-bust (from the book), and that includes wildcat banks.

"Well, statistically it can be hard to prove whether the sun rises due to the cock crowing or vice versa "

LOL, come on Ray, your smarter than that.

Percentage time the sun rose before the cock crowed: x%
Percentage time the sun rose after the cock crowed: y%
Percentage time the sun rose even though we ate the loud ass chicken for Sunday dinner last week: z%

Statistically, the sun rises no matter what the chicken does.

@JWatts - yes, but you tested P(A|B) and then P(B|A), and found they are the same. In the real world that's not always easy to do.

Bonus trivia: you can derive Bayes theorem from a raw crunching of numbers on a probability tree (frequency analysis); arguably Bayes theorem is a shortcut for the days before numerical analysis. Speaking as a frequentist myself.

Pardon the question from a bystander, but what does that mean, "references to identification" in the context of a paper in Economics? I have no idea.

The "identification" problem in econometrics can take many forms. A common, easy-to-understand one is correlation vs causality.

E.g. people who get college degrees have higher incomes than people who don't. Let's say it's a 30% income premium. Does that mean if John Doe goes to college his income will be 30% higher? Or do smarter, more talented people go to college, but they would get higher incomes anyway compared to the dullards in their neighborhood? And if John Doe is a dullard, going to college won't increase his income?

We've got a statistical result, a 30% income difference. But we have an identification problem: did the college degree cause the higher income, or was there some other cause (smarter, more talented kid)?

More generally, the identification problem arises when there are multiple explanations or interpretations of the statistical result. The US pulled out of the Great Depression as the 1930s ended. Was that due to increased military spending as WW II started? Due to the New Deal? Due to the natural workings of the market? Some other cause? What data would we need -- and what econometric techniques would we have to use -- to show i.e. identify which explanation is consistent with the data?

Ah, the sort of problem that identifies another example of how economics is not a science.

The warnings about scientism went unheeded. The current crop of “economists” have plunged ahead with statistical analyses of transitory phenomena, which, in sum, add up to zilch.

Economics, as currently practiced, has become a branch of sociology.

"Economics, as currently practiced, has become a branch of sociology."

That's harsh. Traditionally the kinder identification views it as a branch of astrology.

And you think I’m harsh?

How does the use of the phrase "identification problem" provide more clarity in your examples than the phrase "causation problem"?

He gave a simpler example of the general idea. In econometrics, the identification problems deals with multiple parameter estimations to usually a simultaneous equations model that give similar results. The lack of uniqueness means there are multiple competing explanations that lead to the result. Econometrics doesn't call it a "causation problem" because that is a much stronger condition. The exercise here is strictly modeling.

Nothing sums up how far economics is from a true science than writing "The lack of uniqueness means there are multiple competing explanations that lead to the result."

Competing explanations are never the problem for science, as the explanations are tested, in a replicable fashion, until the correct explanation is proven to be correct.

Natural history, astronomy, geology, ecology, and several other disciplines do not provide much room for repeatable experiments. Of course, there are sometimes natural experiments but then there are also natural experiments in economics, which is precisely what Matt is alluding to.

Experimental economics is a bit different but suffers from the limitation that people behave differently -- or report different behavior -- when in a lab setting or when answering a 45-minute survey and the results don't always scale up.

Which may be true, if the comment concerned experiments.

Astronomy/astrology or geology have had a number of explanations over many centuries for what is observed around us, but those explanations are still bound by observations. Ones that can be performed by anyone in theory, observations able to prove or disprove an explanation. For example, we now know that continents are not immutable in shape or position - based on observations, we can also determine the rate of such shifts. thus underpinning the explanation with data. Not experimental data, merely empirical data. Plate tectonics may not be the definitive explanation for how the earth around us looks, but it is an explanation thoroughly supported by data - and not currently contradicted by any of the collected data.

The age of the earth/universe has had a wide variety of numbers proposed, yet observations allow for the a coherent framework to emerge and then be refined, as the incorrect explanations are shown to be incorrect, and additional data is collected..

Economics has a long way to go before it is even close to approaching the rigor of geology or astronomy.

You mean close to the easy sciences? :-)

Yeah, that was going to be my comment, but you expressed it more concisely.

The natural sciences are much easier and simpler than the social sciences because molecules, stars, ribosomes, etc. always obey the same laws of nature (or can be assumed to).

Human beings in contrast make decisions and change their behavior. Imagine if tornadoes became aware of where the trailer parks were, or kept tabs on what the meteorologists were saying about them. There's no need for a Lucas Critique in the natural sciences. But it's a perpetual complication in the social sciences.

When I was young I attended a short stats course devoted to Model Discrimination which sounds as if it is a better term for what you mean. It followed a short course on Parameter Estimation.

Once again, how does the phrase "identification problem" add more clarity and precision in the examples he gave than "causation problem".


Read third paragraph of mkt42's explanation. If you are only dealing with two variables (sun rising, rooster crowing) you would be OK calling it "causation." But it also crops up in more complicated situations where there are many variables involved, and where in fact "identification" is a more accurate and useful term.

As it is, the origin of this dates back to the 1920s and a paper by the important population geneticist Sewall Wright and his economist father. He was then working for the USDA and was studying the corn-hog cycle. His original problem was figuring out when one observes changes in prices and quantities to identify the extent to which those were due to changes in supply versus changes in demand, a causation problem, although supply and demand can directly affect each other just to mess things up, something we rarely tell Principles micro students.

I also note that many economists think that the emphasis on identification has gotten overdone, sort of a fad, with some denouncing the so-called "Identification Mafia" who constantly go on about it. One is supposed to find "instrumental variables" to resolve this, but no such IV is perfect, so it becomes a boringly expected whine in a seminar by a hardcore ID mafioso to criticize a presenter's Instrumental variable (a favorite in some papers, now widely mocked, is rainfall).

As an editor who reads way too many papers struggling with all this is that if you cannot see the supposed relationship in a simple OLS regression, better yet in a simple bivariate correlation, it is probably not there, although it might be. However, relationships that look like they are there may not be, and that is where worrying about the ID problem comes in, to show that apparent relationships that look like they are there are not really there, although often they will simply disappear once one goes to a multiple regression from a bivariate one without fussing with the ID problem (which might reestablish a relationship that appeared to disappear in a multiple regression). This is indeed a serious matter, even if the ID mafia has gone a bit nuts.

"But it also crops up in more complicated situations where there are many variables involved, and where in fact "identification" is a more accurate and useful term."

I don't buy that at all. If there are two, three, four, five or more possible explanations, "causation" is still the issue. "Identification" doesn't add anything useful to the discussion.

You are right not to buy any of this.

Some of the comments above are absolutely correct, but are unhelpful in explaining what is at stake.

Best way to proceed is with the usual example: Regress, or correlate, quantity on or with price. What do you have? A supply curve? A demand curve? No, you have garbage. But if you know more about supply and demand curves, e.g. that demand depends upon income, too, and that supply depends upon the weather [say], you can use this information to retrieve the supply and demand curves.

This is not about more explanations: We have one, and we test it, but can only test it if we think we know a lot more than a simple correlation.

It's all perfectly good science. :-)


There is another reason to avoid calling this the "causation problem." That is that "causality" is something that involves quite different sorts of econometric tests, known as "Granger causality," after the late Nobelist, Clive Granger. In essence those tests boil down to finding out order of variable changes in time sequence, with the implication that what goes first "causes" what goes later. But we know in reality that is not always true. The cock may crow just before the sun rises in anticipation of its rise, just as people engage in economic behavior in anticipation of an expected future event that comes later after the behavior that the expectation of that event caused.

Indeed, these sorts of complications also show up in identification, which is less directly tied to testing for "causation," even though causal issues are often what is at stake in all the arguing that goes on about the identification problem and the ongoing search for appropriate instrumental variables for specific studies.

I agree with this: no matter how brilliant the research or the insight, someone can always raise their hand (or voice) and say that there's an identification problem.

I also agree -- and many, maybe most?, economists also agree -- that if an effect doesn't show up in simple OLS but does show up in some fancier model then we should be skeptical. But conversely, something that does show up in OLS is not necessarily to be believed either, if there's a serious identification problem.

In this thread:

Joël asks an actual question
Prior goes full autist over the definition of science
dearieme employs British wit
Mkt42 actually answers the question

And of course everyone ignores that micro exists, the actually respectable half of the discipline

I wondered the same thing. "Identification" here means the authors' choice of variables. Or at least what the first reference in the linked paper says. They claim it's synonymous with "research design". I'd guess (but ain't no economist, and so it's only a poorly informed guess) that this bleeds into data processing, metrics (how you measure what it is you want to evaluate), etc. Using the example of the cock crowing and sunrise, the experiment won't "see" the orbital dynamics that actually cause Earth's spin because the relevant variables weren't part of (weren't identified in) the experiment. (orbital dynamics, conservation of angular momentum; the reason (not well understood, imho) that a collapsing protostar almost invariably produces a lot of planets and unless they're so close as to become tidally locked, they will likely have spin (rotation/days & nights). At least the rocky ones...

This gets at a lot of what distinguishes econometrics from statistics as well as the quantitative methods used in other social sciences.

When faced with a bunch of data (Dismalist's examples of a bunch of observations of prices and quantities is a very good indeed classic example), how do you analyze it? One of the things that causes me to be inherently skeptical of machine learning is the total reliance on data (and machine learning) and the ignorance or rejection of theory.

Econometricians are at the opposite extreme, they'll apply a theoretical economic model (and if there isn't one for their application, they'll create one) -- such as supply and demand -- and apply it to the data, and then do the statistical estimation using the constraints (and identification) implied by the model.

Or instead of supply and demand, if it's sunrises then use a model of the solar system, earth's rotation, etc.

We can instantly see how science can make better advances when there are good theoretical models instead of just having a computer look for patterns in a bunch of sunrise data.

But we can also see the big problem: what if the econometrician's assumed model is incorrect? Or some other economist has an alternative model? What if both of their models are incorrect?

As Dismalist said in another comment, and I endorse: the natural sciences have it easy. Even a complex system such as the earth's weather can be assumed to operate on unchanging laws of nature. Humans in contrast change their behavior and in ways that are deliberate rather than merely random.

What does "identification" mean in econometrics? "In econometrics you specify a model for how data comes to exist. The model has some random variable(s) in it so doesn't tell you exactly what data will exist but it might tell you something about the relative likelihood of different hypothetically possible data sets. The model typically has some unknown parameters which you intend to estimate. An identification problem exists if the mathematical nature of the model is such that changing the value of some parameter(s) does not alter the relative likelihood of different potential data sets. It's a problem because you then can't use the data that you actually do have as a basis for estimating the values of those parameters. Imagine a country where everybody always wears a hat. You have a randomly selected sample of heights (hats included) and want to estimate the population average height of a person (hat not included). You can't do it because a population of short people with tall hats can lead to the same data as would a population of tall people with short hats."

I recall reading a study on bias in economic research. The conclusion of the author(s) was that there is no bias in the collection or interpretation of data; rather, the bias is in the selection of the topic to research.

Most of us would agree that "fake news" is a problem because of the potential to diminish public confidence. Should there be a law against "fake news" with penalties against those who publish it? Singapore adopted such a law. But how's this for identification bias: the law cannot be used to correct any falsehoods published by the Singapore government — not unless you can convince a minister from the party in power (the People's Action Party) to act against his own party. Which is about as likely as snow in Florida in July.

What does media/social media political/social advocacy have to do with changes in state of economics?

Is there a good reference that explains what all these terms of art mean?

Just searching MR it’s apparent that TC and AT have devoted very, very little lately to something that will soon have an overwhelming impact on the field of economics. I guess we’ll have to wait until next year when it can’t be ignored.

... the coin-of-realm for economists is "Papers" -- the more the better.

Accuracy and real world utility of these papers content is a far distant concern. Absurdly complex mathematical & statistical analysis and contrived jargon masks the lack of useful substance.

Name even one professional economic insight, discovery, or accurate forecast that has significantly affected or improved the daily lives of average people in past 75 years.

The coin of the realm is citations, not papers, although to get citations one needs either papers or books or book chapters. Note that quite a few Nobel prizes have been given for books and even a few for book chapters (although that usually gets downplayed). however, increasingly what matters now is papers, especially ones published in the top 5 journals. But citations trump the latter and everything else.

Is economics having the same replication crisis as all the other academic "sciences"?

"a virtually linear rise in the fraction of papers"

That's poor phrasing. Did the author just mean a "rise in the fraction of papers" and thought that adding the nonsense descriptors "virtually linear" sounded more mathy?

Has economics changed? You betcha. Libertarians aren't libertarian anymore:


... and some GMU person chose to re-name Democratic Socialism as State Capacity Libertarianism

Why not? They are both utterly meaningless phrases.

Maybe 10% at most of papers should be about identification. The current obsession about identification at the expense of less rigorous but still informative empirical papers on important issues that can't be RCT'd is ridiculous. It's the debacle of game theory in the 1980s all over again. A new technique becomes de rigeur at the expense of less fashionable work regardless of the relative value of the papers' contributions leading to self-censorship and rejection by most good journals of valuable research.

Meanwhile, what does "identification" mean in the NBA? I do not know.

'The Oklahoma City Thunder, for instance, has vice presidents of “insight & foresight” and “identification & intelligence,” while former sportswriter Lee Jenkins serves as the Los Angeles Clippers’ “executive director of research and identity.”'

I see a marked lack of focus on complexity science. Since the economy is a Complex Adaptive System, you'd think economists would embrace the study of such systems.

The only problem is that, as ecologists can tell you, fully embracing the real nature of the economy will tell you that it can't be predicted, it can't be controlled, and even cause-and effect can be impossible to ascertain.

Therefore, complexity science destroys the idea that smart economists can plan and control the economy. Can't have that. No precautionary principle for economists!

Comments for this post are closed