Gambling Can Save Science

Nearly thirty years ago my GMU colleague Robin Hanson asked, Could Gambling Save Science? We now know that the answer is yes. Robin’s idea to gauge the quality of scientific theories using prediction markets, what he called idea futures, has been validated. Camerer et al. (2018), the latest paper from the Social Science Replication Project, tried to replicate 21 social-science studies published in Nature or Science between 2010 and 2015. Before the replications were run the authors run a prediction market–as they had done on previous replication research–and once again the prediction market did a very good job predicting which studies would replicate and which would not.

Ed Yong summarizes in the Atlantic:

Consider the new results from the Social Sciences Replication Project, in which 24 researchers attempted to replicate social-science studies published between 2010 and 2015 in Nature and Science—the world’s top two scientific journals. The replicators ran much bigger versions of the original studies, recruiting around five times as many volunteers as before. They did all their work in the open, and ran their plans past the teams behind the original experiments. And ultimately, they could only reproduce the results of 13 out of 21 studies—62 percent.

As it turned out, that finding was entirely predictable. While the SSRP team was doing their experimental re-runs, they also ran a “prediction market”—a stock exchange in which volunteers could buy or sell “shares” in the 21 studies, based on how reproducible they seemed. They recruited 206 volunteers—a mix of psychologists and economists, students and professors, none of whom were involved in the SSRP itself. Each started with $100 and could earn more by correctly betting on studies that eventually panned out.

At the start of the market, shares for every study cost $0.50 each. As trading continued, those prices soared and dipped depending on the traders’ activities. And after two weeks, the final price reflected the traders’ collective view on the odds that each study would successfully replicate. So, for example, a stock price of $0.87 would mean a study had an 87 percent chance of replicating. Overall, the traders thought that studies in the market would replicate 63 percent of the time—a figure that was uncannily close to the actual 62-percent success rate.

The traders’ instincts were also unfailingly sound when it came to individual studies. Look at the graph below. The market assigned higher odds of success for the 13 studies that were successfully replicated than the eight that weren’t—compare the blue diamonds to the yellow diamonds.

Comments

Picking between choices or options is now 'gambling?'

Of course, describing a stock exchange as gambling is quite defensible.

'As trading continued, those prices soared and dipped depending on the traders’ activities.'

That does not sound like gambling, but the normal functioning of a market. Particularly a market where a large amount of information and expertise is involved in creating the market's valuations.

Market of futures is always gambling, you do make bets based on your predictions. It's not as random as dice/roulette gambling, but more like card games gambling. You do not have 100% of information needed for logical decision, however you try to predict the probability. That's gambling.

'Market of futures is always gambling, you do make bets based on your predictions.'

Sure, but so is buying a car - you are 'gambling' that it will deliver the intended performance in the future.

Basically, any decision involving the future can be seen as a 'gamble.' Yet, we rarely talk about the gamble of taking a shower because it is unknown whether slipping in it causes a life changing injury.

'You do not have 100% of information needed for logical decision'

However, in this case, people with putative knowledge and experience are being asked to judge whether something can be proven to be reliably true through repeating the experiment. One can call that a gamble, but it seems unnecessary. And though as with anything involving the future, 100% certainty is impossible, in this case, the studies provide 100% of the information involved in evaluating their ability to be replicated. The main variation is not in the information available in the studies, but the skills of those evaluating them in a setting where decisions are made that are reflected through a market - that is, people with skill and experience tend to make similar predictions based on the same information. This seems something other than gambling, in essence.

Obviously, word definitions are flexible, and can have multiple uses, one can safely gamble.

I like your comment very much, you are correct in the assessment that anything can be called gambling.

I would argue just one point: studies do not have 100% information needed to replicate them. As a biochemist, I have seen numerous examples when certain papers had incomplete or even incorrect information, moreover, sometimes papers can be misleading either by mistake or by design. For example, you can find seemingly endless papers on homeopathy, yet you can hardly call them easily replicatable.

I would think that any economics (or other half-humanitarian study) study is even harder to prove or disprove and therefore has much less probability of replication than 100%.

Yes, everything is theoretically gambling, but when your probability is higher than certain number, let's say 95%, we believe it to be true, 5% and it's false, anything in between is guessing = gambling. When you take a shower, you empirically know that likelihood of life changing injury is low. Yet you probably have heard of cases like that, therfore you conclude: injury is unlikely but possible in theory and act accordingly (probably you don't jump around covered in oil and soap).

"...people with skill and experience tend to make similar predictions based on the same information."

But didn't all the bettors have about the same ability to evaluate the studies? And if so, who was on the other side of the bet?

All bettors had or were earning relevant PhDs, and all were interested in replicability, but (a) statistical & topic expertise must have varied, and (b) expertise is poorly correlated with forecasting performance.

Other side of the bet: the LMSR market maker is always ready to take the bet; and people disagree about whether it's more like 50% or more like 70% likely, so the price moves. Not a whole lot. If I've grabbed the right data file, the 21 markets each stay almost always on the same side of 50%, and usually within about 20 percentage points.

There are instances where betting on the futures market is motivated by reasons other than gambling, such as hedging.

If you run an airline, and you're 99.9% sure that the price of gas won't go up to $100 per barrel in the next 6 months, but don't want to go bankrupt if it does, you may buy an oil future with a strike price less than $100, so that the success of that oil future is directly anti-correlated with the success of your airline. Two opposite bets to cancel each other out.

This is a well-known gambling method used by sportsbooks, so yes it is gambling

"Thirty years ago my GMU colleague Robin Hanson asked, Could Gambling Save Science? We now know that the answer is yes."

"Daddy needs a new Large Hadron Collider. Come on, dices. Give me a six. Finances Dean is already grilling me over the money I lost week. Give me a six. Why do you bate me?".

Next post: "How panhandling and muggings can save science"

Did you read the article?

You must be new here.

Have you ever read a definition of humour?

Im pretty sure Dave Smith understands humor, but Thiago doesn't.

Aren't you humored yet?

That was not only very funny, it was also a clever reductio of taking the idea of gambling literally.

Thanks.

I'd like to know how this is supposed to magically negate Goodhart's law.

Which is worse - science with provocative BS conclusions that can't be reproduced, or science with nearly meaningless conclusions that can be effortlessly reproduced?

So, 1/3 of the published peer reviewed “social science” studies are junk, and it’s obvious to volunteers who pay some attention which ones are junk?

How is this different from fraud?

Anyone get fired? Lose tenure? Even have their reputation damaged? Why not?

"How is this different from fraud?"
Weak is not the same as fraudulent. A treasurer who signs the checks for the boss's bad investment idea is weak. A treasurer who embezzles is fraudulent. A psychologist who plays the media hype game is weak. A psychologist who fabricates results is fraudulent.

A question for the philosophers I suppose. There’s a thin line between p hacking and fraud.

Psychology and social science studies should not be published until they are replicated independently. That would put a dent in this.

Having an unproved theory and junk is two different things. For example, when Einstein published his theory of relativity it was very beautiful and logical, but it had almost no quantifiable proofs. Imagine, if 10 years later experiments showed that he was wrong. Would his work be considered therefore junk?

Sure, there are a lot of wrong papers, especially in humanitarian studies (they are harder to predict and do not have beautiful all-encompassing laws). But that does not mean that there is no reason for the whole field of studies. It just means the correct answer hasn't been found yet.

There is a difference between theoretical papers (relativity when published) and experimental ones.

Any evidence the journals are upping their standards? A error rate of 1/3 is so high as to call into question basically every single paper published.

Perhaps the journals should include a warning label: "Evidence indicates that the conclusions of about 1/3 of the papers we publish, despite being peer reviewed, are wrong. We just don't care enough to do any better."

In the current milieu it is fashionable, especially for those in the peanut gallery, to blame the journals. But they are only one part of the system we consider "science". Unfortunately, perhaps, academics don't generally color inside the lines even when on their best behavior, and clawing their way to tenure and funding won't display their best. Journals could (arguably) do more, but in case you haven't heard, publishing is under severe economic pressure, and they're doing the sufficient, and not the necessary. Addressing the problem is a systematic social problem, especially difficult when dealing with "products" which is never the same twice.

"[academic] publishing is under severe economic pressure"

What? Elsevier's profit margin in 2017 was 37% (source: Wikipedia, citing RELX annual market reports), which is a lot (most S&P companies are <10%). More data or argument please.

@Konstantinov Roman - good points made, but keep in mind that Einstein's general theory of relativity, which predicts a non-Newtonian eclipse by Mercury when it passes between earth and the sun, was faked in the data according to some accounts. The experimental data, published after Einstein's theory, apparently was "Photoshopped" (sic) or altered to fit Einstein's theory. This is because the resolution of the equipment was not good enough to confirm Einstein's theory. Later, with better equipment, Einstein's theory was confirmed.

Bonus trivia (Wikipedia): lots of fake news and trolls back in the day, vis-a-vis the North Pole:

First to reach the Geographic North Pole (disputed): there were two primary claimants, Frederick Cook, and his two Inuit men, Aapilak and Ittukusuk, on April 21, 1908 and Robert Edwin Peary, Matthew Henson and four Inuit men: Ootah, Seegloo, Egingway, and Ooqueah on April 6, 1909. Peary and Henson's claim eventually won widespread acceptance, but now neither claim is widely accepted.
First to fly over the North Pole (disputed): On May 9, 1926, Americans Richard E. Byrd and pilot Floyd Bennett claimed a successful flight over the North Pole in a Fokker F-VII Tri-motor called the Josephine Ford. Byrd took off from Spitsbergen and returned to the same airfield. His claim, widely accepted at first, has been challenged since.

No, but based on mechanisms like this the authors may find it hard to get financing for their projects.

Better yet, based on this they might even be incentivized to produce better, more replicable science and this is great!

Poor quality control. The OSF has made substantial gains in upping the game with pre-registration, data-sharing, and it is hoped with studies like these, improved peer review paying more attention to P-values and sniff tests.

Also note: the ability of a _group_ to forecast does not mean an individual can do so. Market theory and/or wisdom of the crowds. (Surveys also predicted pretty well.)

But in your favor, simply looking at the P-value does pretty well too. Current peer review is just not doing its job well enough. Kudos to the OSF folks for raising the bar.

Doesn't the prediction market in this case simply confirm the biases of the majority (who have expertise in the subject matter)? Sure, the biases of the majority (orthodoxy) may well be accurate, but doesn't relying on orthodoxy risk losing the potential benefits of heterodoxy? What I mean is that research that produces an outcome that is contrary to conventional wisdom (orthodoxy) will be greatly discounted, which provides incentive for the researcher to bend his findings to conform to conventional wisdom, for the researcher to abandon research that produces results that don't conform to conventional wisdom, or for journals not to publish papers that produce results that don't conform to conventional wisdom? I'm not questioning the sanctity of markets, only pointing out that markets can be wrong, and if research is dependent on conformity with conventional wisdom, then market failure will more likely occur. I know, there will always be the contrarian trader who seeks out places where the crowd is wrong, like the trader in bonds who made a fortune because the crowd wrongly predicted the market (interest rates) in bonds. The difference here is that research that is never conducted or published will be research that is never known.

I've only read Alex's post, but isn't this study only about reproducibility and not whether the experiments had positive or negative results?

Shouldn't reproducibility be a criterion for publication in a journal?

The point of the predictions market is to predict replication so that replication efforts are unnecessary. Here, the predictions market coincided with replication efforts, the former accurately predicting the latter. What's the point of a predictions market if replication efforts are undertaken anyway. To prove the accuracy of the predictions market? What's the point of that?

You don't know when or if any particular study will be attempted to be replicated. Anyone can attempt a replication and submit their results to a journal for publication.

So what you can do is have tradable contracts each time a study gets published. Their prices will go up and down until a replication attempt is reported. Some studies may go decades while others only months.

The replication attempts make the market (without them there'd be no hope of the market ever settling). They can also be an incentive to do replication testing (for example, if you happen to really believe study A is good but the market deems it bad, you could fund a replication attempt). Since most studies will probably not get a replication attempt, the ones that do verify how reliable the market prices are for the many studies that don't.

Even better: over time we can even avoid replication efforts "per se" and set empirical rules that will predict if a study will or will not be replicable.

And that may be incentivize best practices and lead to a better allocation of funds for scientific research.

1. This is a blow to the 'alternative facts' community that likes to cite things like the 'reproduction crises' as a sign you can't really trust even what appears to be settled science. The scientific community does seem to have an intuitive sense of what results are more reliable and in theory even those outside the field could use the results of a prediction market to make educated guesses on which theories they should feel comfortable relying on.

2. This knowledge is likely to crop up somewhere else. Research that builds upon previous experiments. If you're building an experiment/theory that presumes, say, the marshmallow experiment holds. You are placing a big bet by essentially saying you will assume this ground is solid enough to build a house on.

i wonder if this might be an interesting method to target bits of research that falter when attempts are made to replicate them? I imagine items that are widely cited yet rarely have additional studies/experiments that strictly depend on them being true would be prime targets for weeding out.

Im not sure what the 'alternative facts' community is, but i think your conclusion in #1 is wrong. This study indicates that scientists know which studies are fiction and which are likely good science, but allow the fiction to be published anyway. Any given study, especially ones with some political impact, could be said to be one of these wrong but published anyway studies.

If anything, id say this actually reinforces skepticism of science. The only thing that separates science from propaganda is independent replication and confirmation and 38% of the studies examined were not only not replicable, but the fact that they would fail replication was obvious even to a reasonably educated layman. What are journal editors even doing if not filtering these out?

+1

It's worth repeating that the 38% failure rate was for Nature and Science, too, not the Quaterly Journal of Wasted Grant Money or whatever. If the ratio of signal to noise is that bad at the reputable journals, what's it look like for lesser publications?

It might be better. The lesser publications will publish stuff that is less surprising. That is, stuff more likely to be sound.

Very true. Especially because we are talking about Nature and Science here. Those are ridiculously hard journals to get into and the very least they could do is not publish studies that are obvious to the average researcher to have a high likelihood of not replicating.

I think the only possible counter is that these journals want to publish non-intuitive results. The same type of result it makes sense for a neutral observer to assume won't replicate (because in conflicts with conventional wisdom). But then the journal should require a much higher bar of proof to publish in this case. The fail-to-replicate rate in Science or Nature just has to be much lower or the whole exercise is useless.

Yeah. Whatever the reason, it would be interesting to see a "reproducibility ranking" which may shed light on the kind of results that are dubious but are perhaps being "pushed" versus those that are legit.

Kahneman's a good egg who can navigate the numbers, maybe he should weigh in. I sometimes wonder which of the findings from Thinking, Fast and Slow have fallen by the wayside.

I think it's a big mistake to characterize the results as saying that to "the average researcher" or "a reasonably educated layman" it was obvious that these studies would not replicate.

Rather, the most expert people who were best able to evaluate the studies accurately gave low probabilities of replication to the weak studies. The "average" researcher probably stayed out of the betting- only the very confident participated, since only those who think they know something others do not will be convinced they are likely to win money.

That suggests something rather different- perhaps that there are experts at judging the strength of studies, or else that certain people, perhaps experts in a given research area, can judge such studies. Hard to say exactly but not what you are concluding.

The surveys were considerably less accurate, though still decent

It could also imply that they understood some experiments are more difficult to pull off than others. In addition they might also be guessing that some relationships are more sensitive than others. No matter how hard one tries, no two teams will do the same experiment in the exact same way. If the result is very sensitive to such variations, you may get a replication failure even if the original study was valid.

First, inability to replicate something doesn't mean it is fiction. This is especially true of psychological experiments. Humans are not elections so there is no law of nature that says people in one time and place will tell you how people in another time and place will respond to something. In other words, an experiment done in 1965 to test how people react to seeing a message in red typeface may have been done perfectly and the results reported with utmost honesty but it may simply be the case in 2018 people no longer respond the same way. Why the response changed may itself be worthy of a lot more study. On the other hand if the response did stay that same that's very interesting and it may be something we can rely on.

Second, I wouldn't read the results as people 'knowing' which replications would work and which would fail. Instead I think the analogy to sports betting is apt. The market doesn't have to fix the game to get to reasonable beliefs about which teams are likely to win. If you think of the studies being replicated as 'teams' who happened to have won a game a long time ago, then you can think of the market results as being a measure of how much the community believes those teams won because they are inherently strong versus having won simply by a fluke that you can't count on happening in future games.

To your first point, yes its true that something that does not replicate isnt necessarily fiction, but it isnt science either.

Not sure why it isn't science. It's an observation. Random chance says when you're doing hundreds of studies at a 95% confidence cuttoff you're going to get dozens of false positives. Only by repeating tests, testing from different angles, and reporting the results will you get to knowing whether you just drew a strange sample by chance or if you had a relationship that changed over time or a relationship that remains stable.

Well, yes, but if you are going to go around and say that science proves X, Y or Z then you need to have actually proved X, Y or Z. Until you have done that its totally reasonable for people to be skeptical of what you say.

This study indicates that 'alternative facts' community is correct to be skeptical, as it seems to me that 1 out of 3 times you drew a strange sample by chance as you say.

In debate with someone who claims X, you can always call their bluff by asking their source. When they say X because of a study, you can then ask them what other sources they have to support their belief.

A perfectly done study with 95% confidence will still report a positive result that's actually just random noise 5% of the time. It's 100 years later and we still read stories about "so and so tests Einstein and confirms he's right!". To say something is proven requires more than a single study and even requires a lot more than a replication of a single study.

In my example, it may be that people in 1965 had a reaction to red type. Maybe multiple studies show the same reaction. Maybe even 40 years later people are still showing the same reaction. Even after all this it may just be that we're seeing a cultural artifact or some type of fad rather than some part of the human mind that gets a particular type of reaction to red type. Proof would require not just replication but understanding from multiple angles that can be backed up by not just replication but different types of replication.

I might be obtuse but I don't get the headline. Save science from what? and how? Is it that the problem is that replication is long and tiring and expensive, and the solution is to replace it by gambling which is much more fun ? That doesn't make much sense for me.

Errr no the 'gambling' is on whether or not a result will pass a replication test. The test still has to actually get done in order for the bets to eventually be settled.

Then what is saving science?

1. Making replication testing 'exciting'.

2. Providing a reliable way to estimate how trustworthy results are even if the replication testing hasn't yet been done.

"saving science" sounds a bit overblown. Perhaps 'doing science marginally better' would be more accurate.

It also seems to me this could help allocate a limited budget for replicating studies. I would target first studies the market clusters near a 50-50 price as they indicate uncertainty in either direction and then target studies the market puts near 0% or 100% as they indicate studies the community is assuming are accurate. Also testing the studies the market is most sure about would be a way to replicate this study....

Would be better to use this as a gate to getting published in the first place. There is a huge incentive misalignment in academic publishing. Novel, counterintuitive, or politically useful research is selected. The problem with this that it’s much more likely to be bs that should not be published at all. Once it’s out there though, it’s fodder for fact checkers and hacks.

Use a prediction market internally for the journal and externally within the field. Compare the two and hold the journals to account. If a sociology or psychology journal has a low estimated replication rate, then we can give context to the public. “New study by says..but the studies are false 60% of the time” is a much less misleading headline.

Sunshine and transparency.

How would you do a prediction market on unpublished studies to decide what to publish? To me that seems like trying to do sports betting when you haven't had any actual games yet.

Perhaps what we could do is have an open market for studies published in journals. Contracts will be on each study published (when and if a replication attempt is made is unknown). You can then report on the journal as if it was like a mutual fund holding shares of various companies. The value of its studies will go up and down with the market indicating a view on the journals' vetting process.

How about this:

The open betting market for journals: Sociology X has a prediction market rate of 40%. Explain it to journalists that this needs to be included in every article.

Before the study is published, it’s put on the market. That way both are reported on accurately.

“Study x says....however, this study only has a 20% chance of being replicated/true. It’s been published in a journal with a 50% rating.”

Part of the issue is these studies are reported as the truth handed down by Science Herself. Once they’re out there, they’re out there. And any ‘fact checking’ will use them as a reference as if they were literally true.

It needs to be ‘rated’ prior to the Nytimes headline.

I think we are mixing up two different things here. The journal that publishes the article doesn't alter the odds of it being replicated.

Think of it like individual stocks. An individual stock may perform well or bad. A mutual fund may perform well or bad. A mutual fund's performance is just the sum of the stocks it owns. If a good stock happens to get purchased by a fund with a bad performance, the bad fund doesn't drag down the stock, the good stock pulls up the fund.

So in a large prediction market you can view the journals like mutual funds and evaluate them as a whole based on the performance of the studies they publish and the price changes that happened after...but for an individual study you should report on it's prediction price only. When you publish a story about Facebook you quote Facebook's price and it's change, not whether it's owned by good or bad performing funds.

Of course prices move over time so all you can get in a NYT story about a study will be its price at the moment.

It provides a clear price signal though. It offers a feedback mechanism and an incentive for journals to not publish garbage.

It also frames the context for Science TM. To add to journalism so we don’t have the Vox headline, “repubs say x, the new study proves they’re lying liars.” Which is only very slightly modified to an actual article and headline on the website, literally right now.

Sociology X, a journal that Prediction markets say is correct 1/3 of the time, published a study today... Gives a very different impression on a reader than “Science says X, now go forth and use this as fodder for the culture wars!” Which is basically what’s happening.

The journal needs to be spoken for in the context of the article. We can absolutely infer odds from that.

It’s not a mutual fund because a specific journal publishes article y. They don’t all buy a share in the article.

Few problems:

1. Many studies can't be replicated to begin with. For example, Sociology X might have an article about elementary school boys in Spain who went on to fight on the opposite sides of the Spanish Civil War. Observational analysis like that can't be replicated.

2. Of the studies that could be attempted to be replicated, only a few will enjoy the effort.

So Sociology X might publish 100 articles with only 50 being able to be tested like that and only 9 actually getting the treatment. So you could read 6 failures of replication to be "correct only 1/3 of the time" or only challenging at best 6% of the articles published.

3. Replication can be replicated. Journal X published a study that says when people are shown an article in red type, they tend to reject calls for charity. Journal Y five years later publishes a study showing an attempt to replicate X failed. Journal Z five years later publishes a study showing the red type test worked.

So the original idea was in X and you can say two attempts to replicate split 50-50. Or you can say X failed to replicate when Y published. But then Y failed to replicate when Z published after. What if Z published before Y?

I think here the cumulative judgement would be the red test seems to hold up as 3 tests of it had positive results 2 times. But I would make each contract pair off with one replication. So one contract pays if X replicates in Y's result. Another if X replicates in Z's.

Next issue is who decides what a replication is?

Interesting discussion. Some initial thoughts, these are not fully formed so tear apart as needed. Part of my gripe here is the bs ‘sciences’ and the other part is the way it’s reported in journalism and used as a bludgeon.

1. If it’s a data set approach, it’s not science and should not be called science. It’s an experiment or not, it’s a randomized controlled trial or not.

If it’s a historical data set trying to tease out econometric/statistical relationships then the only number reported in the news should be the 95% confidence interval. If the coefficient of the key variable flips signs in the 95% confidence interval, that needs to be the takeaway from the article!

2. Then it needs to be said. Not study x proves y. Treat it thusly: An unreplicated study published in journal Z which fails to replicate studies 38% of the time suggests that ....

3. Then there’s an easy solution. A study replicated 66% of the time says x. The chance of this occuring randomly in 2/3 studied is y.

Anytime I read a sociology paper being cited as proof in a newspaper I roll my eyes. Anytime I read a psychology study being cited in a newspaper I cringe.

Maybe the downstream problem is journalism, but there’s one hell of an upstream problem as well.

Saving social "science" from bullshit is how I took it.

If only we could bet on the accuracy of prediction markets.

I think the accuracy of the prediction market --if you mean how many errors they make, by saying X when in fact it's X-negative--is built into the prediction market. It's the spread between those that say "yes" and those that say "no". The higher the spread, the greater the uncertainty. Eventually enough people decide the spread is sufficiently narrow to dip their toe in. But if prediction markets consistently screws up and awards money to the side that lost the bet, eventually this will seep out and the prediction market will reflect that, in either people unwilling to play or only the uninformed making bets. Actually Wall Street works the same way, it's said that the only reason there's liquidity in the market is due to 'noise traders' (the uninformed) who bring together savvy buyers and sellers.

Yes, so how will 'gambling' save science? We still need to do the replication test, and we do not need 'gambling', do we?

It seems to me that this experience indicates that the scientific community has collectively more information on the quality of social science research than just the fact that it is published in Science/Nature. Hardly a surprise, but I am happy to see it confirm. Now how can we leverage this collective knowledge? What's the plan?

You could always use the betting market to prioritize your replication testing.

For those who aren't social scientists, and thus probably wouldn't know it: the best social scientific research doesn't make it into Science and Nature. The most eye-catching hyped-up social science research makes it into Science and Nature. I would fully expect the replication rate to be higher if we sampled stuff published in the American Economic Review or American Political Science Review (top journals in each discipline).

The actually-fraudulent study by Don Green and Michael LaCour that got published in Science, for example, had been rejected by the APSR. We should expect this, as the reviewer pool is going to be better (editors know who to ask), and also for the less interesting reason that papers are a lot longer, and reviewers expect a lot more in the way of secondary analyses.

I guess one implication of these results is that active investors add tremendous (external) social value, even (especially?) if one doubts the private returns to active management. Active investors provide free research for everyone else, the results communicated through market prices. Moreover, many markets exist without subsidization, calling into question claims that government is necessarily required to handle externalities. Many people have asserted that we devote *too many* resources to active investment management, which is actually a claim that the market overprovides a positive externality.

Only halfway through morning coffee and I see over thirty comments on "gambling saving science" without the first reference to:

https://en.wiktionary.org/wiki/God_does_not_play_dice_with_the_universe

Gambling qualifies as the apt methodology for all so-called "social sciences", perhaps possibly maybe (Las Vegas, the world capital of social science research, in which case).

(How much have findings from quantum mechanics made their way into social science methodologies, btw?)

Why would anybody make such a pedestrian observation in the comments? ;-) Einstein lost that bet. As for your quantum mechanics comment, there was a modern physicist who wrote a social science paper that got published that used gobbleygook quantum mechanic terms. And in the soft sciences like economics they borrow quantum mechanic analogies all the time (Google "Economics, Dark Matter and Richardo Hausmann (2006)")

Wanna bet this study doesn't replicate?

Humorous. But if that's a serious offer, I'll bet heavily in favor of it replicating. That is, a similar study would find a replication rate within the confidence interval, and that a prediction market/survey accuracy would be within the confidence interval.

"The replicators ran much bigger versions of the original studies, recruiting around five times as many volunteers as before. "

Any self-selected "volunteer" is just as good as anybody else on the planet ... when doing rigorous social science research (??)

That random sample stuff is so over-rated

For most psychology experiments, you can't run the same volunteers twice. Also, we only care about studies that claim to hold outside the original dozens or hundreds of participants. So yes, a replication by definition should work with different volunteers.

A failed replication makes it more likely the original result was a fluke: not enough variation controlled, or some other flaw.

So the survey was almost as accurate as the market? Especially at the bottom?

It seems so. In the supplement, the mean absolute deviation is .348 for the survey and .303 for the market. (A bit worse when counting only the first replication attempt for studies that had two.) That's about 14% better, and if I read correctly that difference is statistically significant, while the correlation differences were within error.

The chart breaks down a range of choices for what the market or the survey predicted into a simple yes-no, but the result isn't a simple yes-no. How far off were the predictions?

Yeah, but all of these studies got published in the first place. Which means that scientists weren't as good at predicting replicability the first time through.

TL;DR. But take a critical look at the graph. The ordinate is "fake". The arrangement has, as far as I can see, no justification other than to manufacture what seems to be a linear relationship. I'm not sure what that means (that it can be so arranged), are you? Also, take a look at the values between 0.4 and 0.6. I do not see the graph as suggesting that if the market rates a study 0.4 and another at 0.6, that the latter is more likely to be replicated than the former, only the extremes 0-0.3 & 0.7-1 seem predictive...it seems to me.

A better graph would show a delta between the predicted confidence interval and the actual confidence interval.

I wonder if this could work for monetary policy.

So here is how you get proof in social science: try to teach it through experiment.

A little over 20 years ago, I took a class from Dick Thaler in the new field of Behavioral Economics. Having had classical Economics and found its ability to be predictive a little lacking, I was skeptical. But Dick's thesis was, "classical economics fails because most of its underlying assumptions are provably false." That was pretty ballsy (although if you know Professor Thaler, you know it's entirely in character).

So, how do you prove the thesis? You challenge the assumptions, one by one, by doing having the class do experiments. I remember clearly demonstrating just how bad humans are are understanding randomness by rolling a die 20 times. I was no longer a skeptic about Behavioral Economics (and I applauded when he won a well-deserved Nobel).

How many social scientists would be willing to try to replicate their results every semester with their students as subjects? My prediction: at least 38% would not want to try.

Few additional thoughts I see:

It's pretty hard actually to define what one means by 'replication failure'. How about Milgram's famous 'six degrees' experiment where he sent people letters. If this was done today via Facebook and failed would that be a replication failure or would you have to do the experiment exactly the way he did it? If you did it by letter, though, how do you account for the fact that in a world where people get text messages, emails, social media messages etc., it's harder to pay attention to the mail from the postman?

If real money is at stake, I could see these 'futures contracts' getting bogged down in endless legal arguments over whether or not a replication really happened or the test was flawed. I could also see conflicts of interest if the researcher purchases contracts going one way or the other thereby essentially betting on the results he is going to announce.

This is a lot of work that doesn't really uncover new knowledge IMO. The study above establishes as a community scientists have a range of confidence in the results of previous studies and generally the ones they have less confidence in are more likely to fail a replication test.

This makes sense because studies are investments. If you create a study that is premised on some previous result holding, then you are risking all that time and effort being wasted should your assumption fail. If you strongly suspect some study will fail replication, then you have an incentive to invest in designing a study that will test that response head on. If you're right you get the study published and the 'fame' of overturning a previously held idea.

I'm just thinking, though, that the more you think about adding a large futures market to studies, the less you will get out of it and the more potential problems you'll introduce.

The y-axis should have been the ultimate p-values - I have no idea from the data provided whether surveys get the job done or are even better than "gambling". And surveys are simpler and much less cute.

Let's bring in the idea of the blockchain into this mix.

When deciding to do a study, a researcher can:
a. Ignore previous results and study something not yet measured.
b. Replicate a previous positive result.
c. Build upon a previous positive result.
d. Replicate a previous negative result.

Studies cost money of course but for the scientist they cost time. After spending the time to do the study dramatic, important results help advance the scientist's career.

It seems to me the replication options c and d are relatively 'safe bets'. Because tests were previously done, this is something that can be tested. Replicating a positive result does advance science and given the replication attention it probably will be publishable. But it probably won't be dramatic. There is the possibility that the replication will conflict with the previous test yielding a dramatic difference in result...which could get your name attention.

A and C seem high risk and moderate risk. A because you're prodding brand new ground and C because you are trusting the previous result holds while you try to build something on top of it.

I would say we may not need prediction markets here. Instead we should map how many results were attempted to be replicated and how many tests were build on top of old results.

Building results on top would indicate a high confidence in the underlying test as would replicating it. A result that has been ignored since is likely suspect. Why aren't people building on it and replicating it? Possibly because they suspect it isn't a foundation that will hold up to scrutiny. Since it hasn't been used the value in overturning it by doing a replication test is minimal. So the results are allowed to 'stand' but they aren't really used.

So this would require building a network of results and considering how they either replicate or build on previous results. The results with the thickest layers of building up and replication would indicate the safest ones to bet will hold up.

Comments for this post are closed