That is a new paper by Abel Brodeur, Mathias Lé, Marc Sangnier, and Yanos Zylberberg:

Abstract:Journals favor rejections of the null hypothesis. This selection upon results may distort the behavior of researchers. Using 50,000 tests published between 2005 and 2011 in the AER, JPE and QJE, we identify a residual in the distribution of tests that cannot be explained by selection. The distribution of p-values exhibits a camel shape with abundant p-values above .25, a valley between .25 and .10 and a bump slightly under .05. Missing tests are those which would have been accepted but close to being rejected (p-values between .25 and .10). We show that this pattern corresponds to a shift in the distribution of p-values: between 10% and 20% of marginally rejected tests are misallocated. Our interpretation is that researchers might be tempted to inflate the value of their tests by choosing the specification that provides the highest statistics. Note that Inflation is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.

For the pointer I thank Michelle Dawson.

**Addendum**: Here is related commentary from Mark Thoma.

I can’t wait for the day when we stop looking for the probability of the data given the hypothesis when what we really want is the (subjective) probability of the hypothesis given the data.

“Inflation is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.”

If you are going to allocate the same amount of time to an article, and let’s say you have a “theoretical model”, you obviously are going to have lesser time to play with the data.

I don’t see why this is necessarily good.

“We see that pattern in the data. Message to political science professors: you are being watched. And if you report results just barely above the significance level, we want to see your work….”

Still people think the problem is the individuals rather than the incentives.

Seriously, who cares? This doesn’t impact the analysis at all. If people need to get over a psychological barrier and the cost is fudging a p-value from 0.501 to 0.499 does it really matter?

I agree with Andrew, change the incentives before pretending like rational actions are somehow destructive.

Scenario 1: Physicists are trying to determine if the p-value for the data for the Higgs is at most 0.0000003 and so far they’ve gone below 0.00003. They are doing the p-value “inflation” by taking more data. The LHC will stop publishing collected data as “Higgs searches” at that point because a 5 sigma is “discovery” and funding directed at publishing data as “Higgs searches” will be used for other things. In physics papers, there are probably a dearth of papers at 4.9 sigma significance particles and a surplus at 5. Is this nefarious? I don’t think so.

Scenario 2: Testing of an AIDS therapy begins to yield results that appear to curb transmission and the trial is stopped. The sample population was 1763 was divided in half (yes, I know) and there were 27 infections in one group and 1 in a second group. Statistical error would be on the order of sqrt(1763/2) ~ 30, so the two results are within the error. There are moral considerations in this particular case with human trials so I don’t think this is nefarious (there are ways of accounting for stopping early), but in general it is bad scientific practice to stop a study early when your results are at the 1 sigma level. I could show around 30% of coins are unfair using that methodology.

http://www.nytimes.com/2011/05/13/health/research/13hiv.html

Is this bump at 0.05 nefarious? Maybe, maybe not. People will stop their studies if they reach 0.05 and publish if that is the norm. Or they will continue by adding a few more trials if they’ve only gotten to 0.051. It really depends on the individual papers.

I definitely think the population of submitted papers has this peak, and it’s not solely the journal’s fault (Except sort of as a Fed expectations: if people think the Fed wants 2% inflation, we get that. If scientists think the journals want 0.05 p-values, we get that.)

“Physicists are trying to determine if the p-value for the data for the Higgs is at most 0.0000003 and so far they’ve gone below 0.00003. They are doing the p-value “inflation” by taking more data.”

For the record, if they apply that kind of optional stopping / data peeking, their p-value needs to be adapted for that and it might not drop at all.

Comments on this entry are closed.