Results Free Review - Marginal REVOLUTION

If researchers test a hundred hypotheses, 5% will come up “statistically significant” even when the true effect in every case is zero. Unfortunately, the 5% of papers with statistically signficant results are more likely to be published, especially as these results may seem novel, surprising or unexpected–this is the problem of publication bias.

A potentially simple and yet powerful way to mitigate publication bias is for journals to commit to publish manuscripts without any knowledge of the actual findings. Authors might submit sophisticated research designs that serve as a registration of what they intend to do. Or they might submit already completed studies for which any mention of results is expunged from the submitted manuscript. Reviewers would carefully analyze the theory and research design of the article. If they found that the theoretical contribution was justifiably large and the design an appropriate test of the theoretical logic, then reviewers could recommend publication regardless of the final outcome of the research.

In a new paper (from which the above is quoted) the editors of a special issue of Comparative Political Studies report on an experiment using results-free review. Results-free review worked well. The referees spent a lot of time and effort thinking about theory and research design and the type of institutional and area-specific knowledge that would be necessary to make the results compelling. The quality of the submitted papers was high.

What the editors found, however, was that the demand for “significant” results was very strong and difficult to shake.

It seems especially difficult for referees and authors alike to accept that null findings might mean that a theory has been proved to be unhelpful for explaining some phenomenon, as opposed to being the result of mechanical problems with how the hypothesis was tested (low power, poor measures, etc.). Making this distinction, of course, is exactly the main benefit of results free peer review. Perhaps the single most compelling argument in favor of results-free peer review is that it allows for findings of non-relationships. Yet, our reviewers pushed back against making such calls. They appeared reluctant to endorse manuscripts in which null findings were possible, or if so, to interpret those null results as evidence against the existence of a hypothesized relationship. For some reviewers, this was a source of some consternation: Reviewing manuscripts without results made them aware of how they were making decisions based on the strength of findings, and also how much easier it was to feel “excited” by strong findings This question even led to debate among the special issue editors on what are the standards for publishing a null finding?

I’ve seen this aversion to null results. In my paper with Goldschlag on regulation and dynamism, we find that regulation does not much influence standard measures of dynamism. It’s been very hard for reviewers to accept this result and I don’t think it’s simply because some referees believe strongly that regulation reduces dynamism. I think referees would be more likely to accept the exact same paper if the results were either negative or positive. That’s unscientific–indeed, we should expect that most results are null results so this should give us, if anything, even more confidence in the paper!–but as the above indicates, it’s a very common reaction that null results indicate something is amiss.

Here, by the way, are the three papers reviewed before the results were tabulated. I suspect that some of these papers would not have been accepted at this journal under a standard refereeing system but that all of these papers are of above average quality.

The Effects of Authoritarian Iconography: An Experimental Test finds “no meaningful evidence that authoritarian iconography increases political compliance or support for the Emirati regime.”

Can Politicians Police Themselves? “Taking advantage of a randomized natural experiment embedded in Brazil’s State Audit Courts, we study how variation in the appointment mechanisms for choosing auditors affects political accountability. We show that auditors appointed under few constraints by elected officials punish lawbreaking politicians—particularly co-partisans—at lower rates than bureaucrats insulated from political influence. In addition, we find that even when executives are heavily constrained in their appointment of auditors by meritocratic and professional requirements, auditors still exhibit a pro-politician bias in decision making. Our results suggest that removing bias requires a level of insulation from politics rare among institutions of horizontal accountability.”

Banners, Barricades, and Bombs tests “competing theories about how we should expect the use of tactics with varying degrees of extremeness—including demonstrations, occupations, and bombings—to influence public opinion. We find that respondents are less likely to think the government should negotiate with organizations that use the tactic of bombing when compared with demonstrations or occupations. However, depending on the outcome variable and baseline category used in the analysis, we find mixed support for whether respondents think organizations that use bombings should receive less once negotiations begin. The results of this article are generally consistent with the theoretical and policy-based arguments centering around how governments should not negotiate with organizations that engage in violent activity commonly associated with terrorist organizations.”

Addendum: See also Robin Hanson’s earlier post on conclusion free review.