Testing peer review by running submissions through the process twice

In particular, about 57% of the papers accepted by the first committee were rejected by the second one and vice versa. In other words, most papers at NIPS would be rejected if one reran the conference review process (with a 95% confidence interval of 40-75%)

Here is another framing:

If the committees were purely random, at a 22.5% acceptance rate they would disagree on 77.5% of their acceptance lists on average.

That is from Eric Price on the NIPS experiment, there is more here.

For the pointer I thank a loyal MR reader.


So now we know that judging the worth of papers involves a certain amount of judgement. Glad that's been cleared up.

Now we know that 'certain amount' of judgement is arbitrary.

Referee Recommendations
Ivo Welch; University of California, Los Angeles (UCLA); National Bureau of Economic Research (NBER)
"This paper analyzes referee recommendations at the SFS Cavalcade, where a known algorithm matched referees to submissions, and at eight prominent economics and finance journals (ECMTA, JEEA, JET, QJE, IER, RAND, JF, RFS). The behavior of referees was similar in all venues. The referee-specific component was about twice as important as the common component. Referees differed both in their scales (some referees were intrinsically more generous than others) and in their opinions of what a good paper was (they often disagreed about the relative ordering of papers). My paper quantifies these effects."

What is the purpose of publishing a paper? If the purpose is to vet the data, then every paper with accurate results should be published. If the purpose is to provide prestige to the authors, then we should keep the acceptance rates low as possible.
I was thinking about where the inflection point would be on a graph of number of papers published and maximum usefulness. It’s obviously a fuzzy concept, but I think it would greatly depend on the purpose of publishing papers in the first place.
Does signaling within academia or field advancement take precedence?

The purpose is to provide tenure to like minded individuals.

If this is true, then why have a review process? Simply look at the name attached.

I also wonder how much variance n the different fieldst here is in the signaling vs usefulness relationship of papers published.

Because being part of the review process also goes on your tenure application (I've been there).

I can't comment on your hypothesis until I see the napkin.

Funny. But I don’t even have a hypothesis, just a question of why we have the process in the first place. If it is pure signaling, then looking too closely is pointless. If it isn't a peer review of the material but a review by committee of who they wish to be a peer, then a secondary committee is a waste of time. If it is just signaling, then we should keep the number of published papers low and pay more attention to who the author is than to what they wrote.
If we want a system that rewards advancements in knowledge, then we should publish the maximum of papers that meet stringent requirements.
Real life is probably, at least in part, pretending the second to add legitimacy to the first. In which case, there is an equilibrium that needs to be maintained.

People can't read everything, so there's value in picking out the most useful papers, not just vetting the data. Although "useful" isn't the same as "dramatic" or even "interesting".

The PLoS One model just needs to be embraced more: are the methods good? Okay, you get published. No judging of importance, merit, etc.

So, the study brings evidence (just not conclusive) that the acceptance process is not completely random? They need a larger sample size before I'm convinced...

About the purpose of publishing a paper, each party involved has a different purpose. Researchers mostly want to raise a score, supporters want to see returns from the money they spent, and publishers want interestng and accurate results to increase their relevance. Optimizing for all groups is not a simple process.

Perhaps there are a many different orthogonal dimensions upon which a paper may deserve to be rejected, but each reviewer only knows 1 or 2 of those dimensions. Different committee = different dimensions inspected. If there a large enough pool of dimensions, then unless the committees are large enough, rejection can look like luck.

This comment was rejected,

But not by your peers.

In law school, people who got bad grades would always say "exams are a crapshoot"

And the people who got good grades would laugh, secretly believing that exams were completely error-free measures of aptitude.

In general the people with good grades in law school have to loudly agree with the "crapshoot" folks; it's considered antisocial to do otherwise.

And of course they understand there is a certain amount of noise to the process, how else to explain the handful of completely unjustified B+s?

The reasons for rejection matter. If it was due to perceived errors in the publication, then this is a problem. If this is just due to one committee preferring some papers over others this is hardly news. (If one committee is dominated by, say, the theoretical physicists and the other by the experimental folks it would not be surprising if they chose different papers.)


