…our results suggest that the [instrumental variables] and, to a lesser extent, [difference-in-difference] research bodies have substantially more p-hacking and/or selective publication than those based on [randomized controlled trials] and [regression-discontinuity]… (p.3)
We find no evidence that: (1) Papers published in the ‘Top 5’ journals are different to others; (2) The journal ‘revise and resubmit’ process mitigates the problem; (3) Things are improving through time.
That is from this forthcoming AER paper by Brodeur, Cook, and Hayes.
In contrast, this blog post argues that:
I have proposed here that we should not infer that literatures with more bunching just past .05 are less trustworthy, and that visually striking comparisons of ‘expected’ and observed test results can be quite misleading due to incorrect assumptions about the expected line.
The authors respond here. I do not yet have an opinion on this dispute, but everyone is talking about it right now, so I thought I would at least send along the basic documents to you all.