Why aren’t non-parametric statistics more popular in economics?

by on March 9, 2009 at 7:36 am in Science | Permalink

Abel, a loyal MR reader from Valencia, asks:

I've recently been reading an introductory book on nonparametric models (Nonparametric and Semiparametric Models: an Introduction – Härdle et al) and the apparent flexibility of the approach makes me wonder why aren't those models more used in empirical economics.

¿Are their drawbacks too big? (The so-called "curse of dimension")
¿Perhaps it's just economist's community lack of knowledge or willingness to learn?
¿Are they perceived as a threat to conventional or more established estimation methods?

I would be really glad to hear your opinion and also the feedback in the comments.

He could have added "signalling" to the list, since many non-parametric methods are relatively easy and thus do not demonstrate the skill of the researcher.  But the fundamental reason I think has to do with the nature of economics: non-parametric methods are most likely when you don't have a well-defined, formal structural model in mind.  But many MR readers know more about this than I do, so please offer us your opinions…

Pedro March 9, 2009 at 8:11 am

In univariate models, a main advantage of non-parametric methods is to get the shape of the distribution and be able to represent it graphically. With multidimensional models, which are the bread and butter of the profession, you simply lose that big advantage of non-parametric estimation. Aside from that, the curse of dimensionality implies that consistency is harder to ensure in models with a lot of variables. I wouldn’t put lack of knowledge or willingness to learn as reasons why they are not used (many of the methods are relatively simple to understand, as Tyler points out, and to apply) but I also wouldn’t give a lot of weight to the idea that they don’t signal skill (a lot of people in second and third tier schools are usually desperate for clear cut methods with easily understandable results).

To my mind, the main drawback is the problem of dimensionality when it comes to strictly non-parametric methods.

Robert Bell March 9, 2009 at 8:16 am

One opinion, from John Cochrane, “Asset Pricing, Revised Edition”, Section 11.7 “Estimating the Spectral Density Matrix”, p222

“Non-parametric” corrections such as (11.19) often do not perform very well in typical samples. The problem is that “non-parametric” techniques are really very highly parametric; you have to estimate many correlations in the data”.

Jonathan Falk March 9, 2009 at 8:22 am

As someone who has argued for more nonparametric methods for years, there are usually two objections. The first, as Pedro points out above, is dimensionality. But I find that objection is sometimes overstated, since the researcher is often really only interested in one or two effects — the rest are nuisance variables which can be subtracted out parametrically, leaving a nonparametric representation for the effects we’re really interested in. The second objection is a lack of conciseness to results, an objection I often find ridiculous for someone who is actually looking for an answer, although I grant that it is easier to explain a coefficient than a lowess curve. I love Tyler’s signalling answer, since it reveals what I think is the greatest advantage of nonparametric methods: the ability to explain exactly what you did to nonstatisticians and allow them to make cogent critiques of what you’ve done. By the way, my introduction to nonparametric techniques was not in grad school (of course!) but through the baseball writings of Bill James in the eighties.

scarcity March 9, 2009 at 8:54 am

The signaling argument works only for the upper-tier of academics and journals. As Pedro suggests, most applied researchers are not interested in demonstrating their structural modeling capabilities, at least not as it applies to data. The “kitchen-sink” regression and reduced-form approach are too abundant for the signal story to carry much weight.

mravery March 9, 2009 at 9:29 am

The principal benefit from nonparametric models is their flexibility: you can perform inference and make estimation without broad, sweeping (and highly restrictive) model assumptions.

I posit that nonparametric estimation isn’t popular among economists because economists have been integrating broad, sweeping assumptions into their analysis since the start of the profession and are thus unimpressed by this benefit.

A.M. March 9, 2009 at 9:35 am

I’m with Tyler “But the fundamental reason I think has to do with the nature of economics: non-parametric methods are most likely when you don’t have a well-defined, formal structural model in mind.”
But this brings up the related question: why don’t I see any Bayesian statistics in econ? (I will accept an answer of “you’re looking in the wrong place”) Bayes seems like a natural fit because we do already have so much nice theoretical framework.

gurg March 9, 2009 at 10:21 am

a couple of points. first, cutting edge methods are now incorporating non-parametric methods routinely – see, e.g., in the treatment effects literature the heckman-vytlacil and manski approaches. second, until the popular software packages incorporate easy to use nonparametric routines, most folks (i.e., those not on the cutting edge) will continue to use fully and/or semi-parametric methods that are included in the packages. relatively few people roll their own with gauss or matlab or what have you. third, i think most people doing empirical research do not recognize the degree to which they are imposing restrictions on their models – one example i’ve run into a lot is people insisting that quantile regressions cannot have superior properties to least squares because least squares is blue (missing the point that blue is limited to *linear* estimators). i suspect in 20 years there will be a lot more non-parametric work out there

Some Random Economist March 9, 2009 at 10:52 am

“the rest are nuisance variables which can be subtracted out parametrically, leaving a nonparametric representation for the effects we’re really interested in.”

At this point, you’re doing semi-parametric estimation, which is increasingly common.

The main reason I think you don’t see more nonparametric estimation in economics is that economics often deals with problems (e.g., anything with endogeneity) that are difficult to handle without imposing some structure.

jason voorhees March 9, 2009 at 11:26 am

Tyler’s suggestion doesn’t seem right. The use of OLS is widespread, and it’s not justified b/c everyone thinks the formal structure of the model is additive. The reduced form econometrics that Angrist has popularized is non-structural, in nature, after all.

There is some interest in non-parametric matching estimators, like Imbiens and Abadie have put forth. This is because there is interest in more methods of estimating treatment effects.

Randomista March 9, 2009 at 12:09 pm

Presentation is important. Parametric specification may not be “correct” (which we seldom know for sure), but it is much easier to present. When the data is good, parametric and non-parametric statistics usually provide similar inferences. In this case, non-parametric method is unnecessary complicated. Reliable studies are usually in this category. The underline analysis is non-parametric, but the presentation is parametric.

If the data is bad, no statistical model can save the study.

That leaves non-parametric (or semi-parametric) methods to situations where the inferences are sensitive to the parametric specifications. Crafting out a reliable and convincing study from data with such limitation is not easy. Sometimes the most cost-effective non-parametric method is by collecting data from randomly assigned controlled experiments!

Until economists do more experiments, non-parametric methods will always be secondary.

Barkley Rosser March 9, 2009 at 1:54 pm

The matter of functional form is important. Non-parametrics pretty much
become inevitable if one is dealing with a nonlinear form that cannot
be transformed into a linear one. I have also seen that quite often
non-parametric estimations, including kernel ones, will outperform
parametric ones in out-of-sample forecasting, which many would argue is
the real bottom line.

BTW, I also see Bayesian methods continuing to spread into econometrics.

jason voorhees March 9, 2009 at 4:10 pm

The one thing keeping Bayesian econometrics from making headway into economics is that no one has developed a -breg- routine in Stata. (buh dum dum!)

Seriously, though – the real divides in econometrics are the structural vs. reduced form (“causal effects”) divide.

Barkley Rosser March 9, 2009 at 6:45 pm

Actually Frank brings up something that sharply questions a part of Tyler’s original posting that I found a bit
questionable, namely the claim that “non-parametric methods are relatively easy.” Really? Frank suggests just
the opposite, that they do not get taught precisely because they are “hard.” Where did you get this idea that
they are “easy,” relatively or absolutely, Tyler? My experience has been to see conventional economists waving
their hands in unhappiness when confronted with studies using such statistics because they have no idea what the
heck is going on.

Brian March 9, 2009 at 10:24 pm

See Justin Tobias and John DiNardo ‘s “Nonparametric Density and Regression Estimation” JEP 2001.

Statistician March 10, 2009 at 3:06 am

As a related matter, I hardly ever see economists use “robust methods” for regressions and inference generally. We have known for over 50 years that using OLS can get you into deep trouble when the underlying distribution has fat tails, or when there are outliers. Indeed in extreme cases, such as a Cauchy distribution, increasing sample size is no help — OLS estimates will never converge to population values.

There are families of estimators, such as M-estimators, designed to be robust to the underlying distribution. We’ve had them for 35 years. But I doubt that more than a handful of economists even know they exist.

Ahmad March 10, 2009 at 3:37 am

There are cases that nonparametric estimation is very useful but the curse of dmensionality is a big issue and in many cases there are other more important things to care about like time series issues, endogeniety, imposing economic restrictions which are much more difficult with nonparametric and semiparametric models. It also takes substantial time to learn nonparametric stuff. I don’t think established profession find it as treat but whitin econometrics, kernel methods are the main tools while there are other competing and sometimes more useful techniques not used by economists probably because of learning costs and people leading the field.

Abel March 10, 2009 at 9:41 am

Thanks for covering my question, Mr. Cowen. The reason I asked all of you about nonparametric models is that I’ve been thinking about separating the different factors that impede international trade.

If any of you have worked on the issue, you’ll probably know how hard are the linearity assumptions when dealing with factors such as conventional tariffs, technical barriers to trade, information costs, cultural differences, home bias, etc. How can one try to explain the “missing trade” assuming some sort of linearity in variables with such a different nature?

More feedback will be, of course, welcome!

Barkley Rosser March 10, 2009 at 11:48 am

Oh yes, regarding this issue of ‘tail events,’ I find myself
having to warn colleagues against that all-too-widespread
practice of “throwing out outliers.” Unless one has reason
to believe that the outlier is actually an incorrect datum,
one should not in general throw them out. As Taleb and others
have emphasized, sometimes the most important data points of all
are exactly those “outliers.”

Jacob Weaver March 10, 2009 at 12:40 pm

Barkley,

Thanks for your reply. I agree, absolutely, that unobserved elements of the distribution are problematic for parametric models as well, and I didn’t mean to suggest that it’s only the black swan problem at issue when it comes to data quality. Indeed, in economics in general the difficulty of collecting reliable estimates of anything from unemployment to price level suggests that there ought to be more skepticism of any data, let alone the models applied to it.

As far as hazarding a guess as to why parametric models dominate in economics, though, I think that the (at least hypothetical) possibility that the parametric assumption could account for heavy tails or other data issues might explain the predominance of those models. Obviously this is not the case if you just go around assuming everything is Gaussian, but the point I’m trying to make is that given any prior knowledge of the ‘true’ distribution function, the parametric model will dominate. Economists, whether by arrogance or over-reliance on mathematical constructs, often start with pretty strong beliefs about the distribution.

Jacob Weaver March 10, 2009 at 12:59 pm

And by the way, I don’t mean that as a critique of mathematical economics, which often presents a more accurate theory of the world than what can be extrapolated from flawed data.

Barkley Rosser March 11, 2009 at 1:01 am

liht und Furstenbank und Ravio ist der scheissekopfen.

Jeff Smith March 11, 2009 at 8:21 am

What a fine discussion!

I posted some comments on my blog (econjeff, blogspot).

Thanks to Olivia for the mention and for stating my point about matching not being a “magic bullet”.

Barkley Rosser March 12, 2009 at 10:53 am

Dave,

Contact me offlist and tell me where you submitted and the status of
your papers. I edit JEBO.

Comments on this entry are closed.

Previous post:

Next post: