How big a deal is replication failure?

by on July 8, 2014 at 2:18 am in Data Source, Science, Uncategorized | Permalink

From Jason Mitchell at Harvard:

Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value.

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.

Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.

Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.

The field of social psychology can be improved, but not by the publication of negative findings.   Experimenters should be encouraged to restrict their “degrees of freedom,” for example, by specifying designs in advance.

Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.

The full piece is here, I don’t quite buy it but a useful counter-tonic to a lot of current rhetoric.  I found this in my Twitter feed, but I forget whom to thank, sorry!

Addendum: An MR reader sends along this related argument.

Nominull July 8, 2014 at 2:25 am

On the emptiness of Jason Mitchell’s head.

Reply

Rich Berger July 8, 2014 at 7:29 am

I am sure that his head is not empty. The problem is what it is full of. He seems to be lacking in self-awareness or humility regarding the limits of human knowledge.

Reply

replication July 8, 2014 at 11:29 am

More from John List on this issue (One Swallow Does not Make a Summer: New Evidence on Anchoring Effects,” American Economic Review, (2014) Volume: 104 Issue: 1 Pages: 277-290.) In the model we show that the common benchmark of simply evaluating p-values when determining whether a result is a true association is flawed. Two other considerations—the statistical power of the test and the fraction of tested hypotheses that are true associations—are key factors to consider when making appropriate inference. The common reliance on statistical significance as the sole criterion leads to an excessive number of false positives. The problem is exacerbated as journals make ‘surprise’ or counter-intuitive results necessary for publication. But, by their very nature such studies are most likely not revealing true associations—not because of researcher malfeasance, merely because of the underlying mechanics of the methods.

While this message is pessimistic, there is good news: our analysis shows that a few independent replications dramatically increase the chances that the original finding is true. As Fisher (1935) emphasized, a cornerstone of the experimental science is replication. Inference from empirical exercises can advance considerably if scholars begin to adopt concrete requirements to enhance the replicability of results……thin a given study. – See more at: http://marginalrevolution.com/marginalrevolution/2014/07/how-big-a-deal-is-replication-failure.html#comments

Reply

Alex Godofsky July 8, 2014 at 2:25 am

Of course the original positive result surely came from a study conducted by flawless experimenters.

Reply

Stuart July 8, 2014 at 2:36 am

Bingo.

Reply

Alexei July 8, 2014 at 5:33 am

Kept waiting for this obvious point to be addressed. Never happened…

Reply

anon July 8, 2014 at 7:12 am

+1

Reply

Silas Barta July 8, 2014 at 5:44 pm

+1 the possibility of a fluke on the success is never addressed? Yikes.

Reply

ChrisA July 8, 2014 at 2:29 am

If an result in social science cannot be easily replicated by other researchers, except by very closely following exactly the same protocols and set up as the original researchers, it suggests that the effect is not very significant and probably should be ignored. For instance in the various priming experiments, we are led to believe these were very significant indications of flaws in the way that humans think and act – but that can’t be true if they cannot easily be replicated.

Reply

A July 8, 2014 at 2:41 am

In his world, if one guy safely lands with an experimental parachute, and every subsequent jumper plummets, then there is likely some error in the subsequent jumpers. Certainly, the first jumper is unimpeachable in both execution, analysis, and reporting.

Reply

Steve Sailer July 8, 2014 at 4:55 am

But the initial experiment is not even one person surviving jumping out of an airplane: it’s, after seeing words related to old age, students taking an average of one second longer to walk down the hall.

Reply

Chris July 8, 2014 at 9:25 am

Exactly what is the applicability of research that must be followed precisely and perfectly to achieve a result? Any such finding would find no useful application in the real world anyway.

Reply

Rahul July 8, 2014 at 2:30 am

I wonder if the guy has the same attitude towards replicates in things that matter. e.g. steel tensile strength testing, pharmacokinetics etc.

Reply

David H. July 8, 2014 at 3:07 am

Nice! So if I’m building a bridge and using a certain kind of cable, I look up its previously measured tensile strength. But then for some reason I also measure its strength using exactly the same procedure of the original measurement, and the thing breaks under half the rated load.

But now Jason Mitchell from Harvard comes in and says: “That was a pointless exercise. Being fallible, you probably screwed up the replication, so fuck it, full speed ahead! I’m Jason Mitchell from Harvard!”

Reply

Steve Sailer July 8, 2014 at 4:58 am

“Full speed ahead!” is pretty much the the only politically acceptable thinking allowed in the human sciences these days, unless you want end up like Jason Richwine or James D. Watson.

Reply

Axa July 8, 2014 at 5:33 am

Even the physicists that were busy developing quantum mechanics in the early 20th century spent some time thinking about the ether hypothesis. They debated with ether supporters instead of crying the world is full of mean spirited people. But, giving the benefit of the doubt to Mr. Mitchell…..if he is upset with quackery, perhaps his greatest contribution to science would be writing an article on all the implicit knowledge to use a method successfully.

A little context on science writing. To publish your research you are constrained by journals, max number of pages or words. Of course, you use more space for discussion of results than methodology. If someone wants to replicate the experiment based on the methodology found in the article the most probable outcome is not positive. That’s why you put a contact email in the article. If someone is interested in replication you share the whole methodology free of the word-count axe.

Problem here is attitude. Mr. Mitchell is too busy doing “science” to share or discuss knowledge with others.

Reply

Rahul July 8, 2014 at 5:59 am

These days many journals allow supplementary online materials. Post 100 pages of excruciatingly detailed instructions here if need be. Lack of space has become a pretty weak excuse.

Axa July 8, 2014 at 6:22 am

Exactly, you can put all the details in your writing and stop faulting the others for not having “implicit knowledge”.

Brian Donohue July 8, 2014 at 9:30 am

Ring the alarum!

Bad science is generally exposed as fruitless in short order. Think Lysenko. In the ‘human sciences’, things can linger, but the truth will out. Anyone paying attention understands a dubious cloud hangs over social psychology right now.

The meta-issue, in my mind, is coming up with well-designed experiments that can answer specific questions, rather than trying to ‘prove’ larger, more contentious issues. Go where the small answers lead.

Science can be done here, but one must be careful. But there are those who, for whatever reason, want to tar the whole field with the brush of nonsense. This goes too far.

Here’s Kahneman on the subject from two years ago: http://www.nature.com/polopoly_fs/7.6716.1349271308!/suppinfoFile/Kahneman%20Letter.pdf

Reply

Steve Sailer July 8, 2014 at 9:47 am

Dubious science that suits the desires of powerful poltical or economic interests can be around for a long time, stifling awareness of good science. Lysenkoism lasted into the 1960s in the Soviet Union, while Stereotype Threat has been popular and respected for about as many years as “The Bell Curve” has been denounced as “discredited.”

A July 8, 2014 at 2:38 am

There is a lot of strange logic in that article, particularly the claim that the inability to reproduce results represents a negative assertion. Priors never change, apparently, except for validating outcomes.

Reply

Rahul July 8, 2014 at 2:40 am

And to think this guy is a full Professor at Harvard.

Reply

anon July 8, 2014 at 7:15 am

Thank goodness he’s at Harvard, where he can do no damage.

Reply

Candide III July 8, 2014 at 7:44 am

You are way too optimistic. Where do you think our best journalists, NGO staffers and civil servants study?

Reply

prior_approval July 8, 2014 at 2:40 am

‘because unsuccessful experiments have no meaningful scientific value’

There is a reason that some people think any variety of pyschology is not actually a scientific endeavor.

‘The field of social psychology can be improved, but not by the publication of negative findings’

Why spoil a good narrative, at least when storytelling decides to call itself science.

‘Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues’

Man, I never knew that impugning the scientific integrity of their colleagues is the standard on which the concepts of phlogiston and ether could have been retained. Time to restore Johann Joachim Becher to his place, while condemning Antoine-Laurent Lavoisier for impugning an alchemist’s scientifc integrity.

Reply

David Wright July 8, 2014 at 4:17 am

+1. God help me, but I have to endorse a p_a post. Every damn word.

Reply

David Wright July 8, 2014 at 5:14 am

Okay, I went and read the whole piece, and it’s just as bad as the excerpt. Worse, I suppose, since it’s just as whiny and wrong and misapprensive of the whole point of science, but manages to drone on as well. Is Tyler sure this isn’t parody? Are we sure Tyler’s claim that it has value isn’t irony? I suppose there is an iota of insight buried in there, as all good parody requires, but only enough to justify millimeter steps toward the miles-off conclusion to which the author jumps.

Reply

Turkey Vulture July 8, 2014 at 5:26 am

Below, Steve Sailer references something Mitchell wrote that I assume must have been completely separate from the piece Tyler excerpted and you read. And it seems to convey the same thinking. So this might be For Real.

Reply

Steve Sailer July 8, 2014 at 5:32 am

“Is Tyler sure this isn’t parody?”

I wish it were, but I suspect that you don’t dare parody Stereotype Threat and Implicit Bias if you value your career in social psychology. That’s racist!

Reply

Silas Barta July 8, 2014 at 5:49 pm

‘Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues’

… and psychology has officially jumped the shark as a science.

If you can’t submit contradictory results without being accused of a social hostility, you no longer have a safe environment to conduct science.

“Hey, I tested it out — turns out that heaver stuff doesn’t actually fall faster, like we always though.” –> “How DARE you impugn the integrity of [Ancient Greek philosopher]?”

Reply

Thanatos Savehn July 8, 2014 at 2:42 am

I smell desperation. Imagine those who discovered cold fusion making this very same argument and then imagine the smell it would leave in your nostrils.

Meanwhile, poor dead Quine, who never meant to enshrine first-to-be-published “science” as eternal and unassailable truth, must be spinning in his grave. Stuff, as they say, happens; and what (ideas) can be exploited will be exploited. Always.

Reply

David H. July 8, 2014 at 2:57 am

This article is a cautionary tale of what can happen when a smart scientists (with an axe to grind) tries to become a philosopher. The argument he makes would not get a passing grade in an undergraduate philosophy of science class. (OK, with grade inflation it might be a D.)

I read it over twice now to make sure that I’m not misreading what he’s saying. Maybe I’m still missing something. But it sounds to me like he’s saying that because humans are fallible in the lab, replications that don’t reproduce a previously alleged signal can all be explained away as human error, so they show nothing. “Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.”

But if he’s so goddamn skeptical about the capacity of fallible scientists to teach us something when they attempt a replication, why does he completely forget that skepticism when the same fallible scientists try to teach us something in the first place? Did he think that the initial experiment was performed by infallible agents? The positive signal measured in the initial experiment could not have been produced by some sort of mistake? To be fair, he’s not explicitly saying that nothing could have gone wrong on the first pass, he’s only saying that the unreliability of replication experiments make these mistakes unknowable. So we just have to relax and accept the initial results as truth. But come on, that’s just stupid. I’d expect this kind of argument from people who bend spoons and read your palm, but real scientists should absolutely distance themselves from this pattern of argument!

Reply

Rahul July 8, 2014 at 4:59 am

The blend of ignorance, sheer stupidity & arrogance that permeates his whole article would have been a firing offence in almost any sector outside of academia.

Reply

Turkey Vulture July 8, 2014 at 5:03 am

Yes I also hope I am missing something profound beneath the seemingly stupid exterior of his argument.

Reply

Steve Sailer July 8, 2014 at 9:49 am

It’s like an argument denouncing an experiment that fails to support faith healing: the scientists just didn’t have enough faith. Their skepticism ruined everything.

Reply

Marian Kechlibar July 8, 2014 at 9:50 am

His incredible argument, presented quite seriously, illustrates the chasm between the “real” hard sciences and those who try to ape them.

It also illustrates the unfortunate fact that many university departments are “theology light” and cannot get rid of the sordid baggage of pontification and infallibility of senior figures.

Reply

Steve Sailer July 8, 2014 at 2:58 am

Lots of stuff in psychology replicates just fine, even over the generations, such as the interrelationship of IQ, school achievement, job trainability, race, crime, employment, etc. etc. The U.S. military, for example, has been using cognitive testing for 97 years and finds it works quite well:

http://www.amazon.com/Research-Selecting-Classifying-Personnel-1917-2011/dp/077342654X

But social psychologists don’t want to notice that. Instead, they want to do silly experiments about whether you can manipulate college students into walking a slightly different speed to the elevator. Don’t we know already that it’s not hard to manipulate college students into fads?

But don’t we also know that fads wear off?

Reply

Steve Sailer July 8, 2014 at 3:14 am

Here’s Jason Mitchell claiming that failed replications of the beloved “stereotype threat” phenomena just prove that “replicana” is dubious:

“The recent special issue of Social Psychology, for example, features one paper that successfully reproduced observations that Asian women perform better on mathematics tests when primed to think about their race than when primed to think about their gender. A second paper, following the same methodology, failed to find this effect (Moon & Roeder, 2014); in fact, the 95% confidence interval does not include the original effect size. These oscillations should give serious pause to fans of replicana. Evidently, not all replicators can generate an effect, even when that effect is known to be reliable. On what basis should we assume that other failed replications do not suffer the same unspecified problems that beguiled Moon and Reoder? The replication effort plainly suffers from a problem of false negatives.”

Economist John List has a much more cynical appraisal of the wildly popular “stereotype threat” notion

Reply

Turkey Vulture July 8, 2014 at 5:08 am

Okay this seems to confirm that he is an idiot – a successful replication can’t be the product of one of the mistakes he says beguile the failed replications. Instead a successful replication proves the validity of the original finding.

Reply

Steve Sailer July 8, 2014 at 5:19 am

Few Harvard professors are idiots. Some are disingenuous, though. They can always justify doing what’s in their career interests on the grounds that they are lying in a higher cause: to fight racism, sexism, heterosexism, cis-sexism, etc.

Reply

Turkey Vulture July 8, 2014 at 5:32 am

Even with Harvard Profs, I try to apply “never ascribe to malice that which can be adequately explained by stupidity.”

Steve Sailer July 8, 2014 at 6:25 am

“It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”

Brian Donohue July 8, 2014 at 12:27 pm

@Steve, good quote but again I think you’re being too cute. As the UChicago excerpt below shows, there are more risks out there than just keeping your boss happy. Facts are stubborn things.

gwern July 8, 2014 at 12:43 pm

As the philosophy saying goes: one man’s modus ponens is another man’s modus tollens.

Reply

Steve Sailer July 8, 2014 at 3:00 am

From an interview with John List, Homer J. Livingston professor of economics at the U. of Chicago:

RF: Your paper with Roland Fryer and Steven Levitt came to a somewhat ambiguous conclusion about whether stereotype threat exists. But do you have a hunch regarding the answer to that question based on the results of your experiment?

List: I believe in priming. Psychologists have shown us the power of priming, and stereotype threat is an interesting type of priming. Claude Steele, a psychologist at Stanford, popularized the term stereotype threat. He had people taking a math exam, for example, jot down whether they were male or female on top of their exams, and he found that when you wrote down that you were female, you performed less well than if you did not write down that you were female. They call this the stereotype threat. My first instinct was that effect probably does happen, but you could use incentives to make it go away. And what I mean by that is, if the test is important enough or if you overlaid monetary incentives on that test, then the stereotype threat would largely disappear, or become economically irrelevant.

So we designed the experiment to test that, and we found that we could not even induce stereotype threat. We did everything we could to try to get it. We announced to them, “Women do not perform as well as men on this test and we want you now to put your gender on the top of the test.” And other social scientists would say, that’s crazy — if you do that, you will get stereotype threat every time. But we still didn’t get it. What that led me to believe is that, while I think that priming works, I think that stereotype threat has a lot of important boundaries that severely limit its generalizability. I think what has happened is, a few people found this result early on and now there’s publication bias. But when you talk behind the scenes to people in the profession, they have a hard time finding it. So what do they do in that case? A lot of people just shelve that experiment; they say it must be wrong because there are 10 papers in the literature that find it. Well, if there have been 200 studies that try to find it, 10 should find it, right?

This is a Type II error but people still believe in the theory of stereotype threat. I think that there are a lot of reasons why it does not occur. So while I believe in priming, I am not convinced that stereotype threat is important.

Reply

Brian Donohue July 8, 2014 at 9:38 am

All I can say is I really love UChicago sometimes. The life of the mind. The spirit of open inquiry. It’s not just bullshit. I can’t imagine a clown like Mitchell ever getting traction there.

Reply

China Cat July 8, 2014 at 2:27 pm

Alumna (Physical Sciences Division) here, still live in the area.

The place is poaching its brand now, I’m afraid. Everything but the hard sciences and econ have gone to shit.

Reply

China Cat July 8, 2014 at 2:28 pm

*has*

Reply

Steve Sailer July 8, 2014 at 3:06 am

One of the most popular social psychology studies of the Malcolm Gladwell Era has been Yale professor John Bargh’s paper on how you can “prime” students to walk more slowly by first having them do word puzzles that contain a hidden theme of old age by the inclusion of words like “wrinkle” and “bingo.” The primed subjects then took one second longer on average to walk down the hall than the unprimed control group. Isn’t that amazing!

This finding has electrified the Airport Book industry for years: Science _proves_ you can manipulate people into doing what you want them to! Why you’d want college students to walk slower is unexplained, but that’s not the point. The point is that Science proves that people are manipulable.

Now, a large fraction of the buyers of Airport Books like “Blink” are marketing and advertising professionals, who are paid handsomely to manipulate people, and to manipulate them into not just walking slower, but into shelling out real money to buy the clients’ products.

Moreover, everybody notices that entertainment can prime you in various ways. For instance, well-made movies prime how I walk down the street afterwards. For two nights after seeing the Coen Brothers’ No Country for Old Men, I walked the quiet streets swiveling my head, half-certain that an unstoppable killing machine was tailing me.

So, in an industry in which it’s possible, if you have a big enough budget, to hire Sir Ridley to direct your next TV commercial, why the fascination with Bargh’s dopey little experiment?

One reason is that there’s a lot of uncertainty in the marketing and advertising game. Nineteenth Century department store mogul John Wanamaker famously said that half his advertising budget was wasted, he just didn’t know which half.

Naturally, social psychologists want to get in on a little of the big money action of marketing. Gladwell makes a bundle speaking to sales conventions, and maybe they can get some gigs themselves.

But why do the marketers love hearing about these weak tea little academic experiments, even though they do much more powerful priming on the job? I suspect one reason is because these studies are classified as Science, and Science is permanent. As some egghead in Europe pointed out, Science is Replicable. Once the principles of Scientific Manipulation are uncovered, then they can just do their marketing jobs on autopilot. No more need to worry about trends and fads.

Reply

t. collins July 8, 2014 at 11:11 am

Of course, what the marketers don’t realize, is that if it were science, they wouldn’t be able to do their jobs on autopilot. Someone would write an algorithm,and the would no longer have a job.

Reply

Steve Sailer July 8, 2014 at 3:08 am

Even legitimate priming studies are likely to stop replicating after awhile because they basically aren’t science. At least not in the sense of having discovered something that will work forever.

Instead, to the extent that they ever did really work, they are exercises in marketing. Or, to be generous, art.

And, art wears off.

The power of a work of art to prime emotions and actions changes over time. Boredom sets in and people look for new priming stimuli.

So, let’s assume for a moment that Bargh’s success in the early 1990s at getting college students to walk slow wasn’t just fraud or data mining for a random effect among many effects. He really was priming early 1990s college students into walking slow for a few seconds.

Is that so amazing?

Other artists and marketers in the early 1990s were priming sizable numbers of college students into wearing flannel lumberjack shirts or dancing the Macarena or voting for Ross Perot, all of which seem, from the perspective of 2013, a lot more amazing.

Overall, it’s really not that hard to prime young people to do things. They are always looking around for clues about what’s cool to do.

But it’s hard to keep them doing the same thing over and over. The Macarena isn’t cool anymore, so it would be harder to replicate today an event in which young people are successfully primed to do the Macarena.

So, in the best case scenario, priming isn’t science, it’s art or marketing.

Reply

msgkings July 8, 2014 at 5:10 am

This is some of the best posting I’ve ever seen from Sailer. Very good work.

Reply

Art Deco July 8, 2014 at 8:39 am

wearing flannel lumberjack shirts or dancing the Macarena or voting for Ross Perot

What’s ‘amazing’ about flannel shirts? What’s ‘amazing’ about voting for the very accomplished Mr. Perot when half his opposition consisted of a pair of grifters like the Clintons?

Reply

Aidan July 8, 2014 at 3:13 am

Pity the poor social scientist, forced to carry out qualitative research by quantitative means, because the only funding available is for work that can be represented on an Excel spreadsheet.

Reply

Turkey Vulture July 8, 2014 at 4:58 am

Hopefully this is intentionally terrible logic, meant to be used to teach undergraduates a valuable lesson about the scientific method.

Reply

Turkey Vulture July 8, 2014 at 5:18 am

I am going to conduct an experiment to see if the name Jason Mitchell primes people to perform poorly on tests of logical reasoning. I will keep running this experiment, with minor methodological tweaks, until I eventually get a p < .05. Then I will publish my findings and thereby establish this irrefutable truth against all future failed replications.

Reply

Steve Sailer July 8, 2014 at 5:25 am

The term “Jason Mitchell, Harvard professor” is quite likely to prime social psychology grad students to shut up about their doubts and parrot the party line if they too want a shot at being a Harvard professor.

Reply

Michael H. July 8, 2014 at 5:23 am

If Jason Mitchell hopes to draw attention to the arguments he is making here he needs to up his rate of using character strings like “neoliberal” “Nice Guy” “decoherence” and “cold fusion”.

Reply

andrew' July 8, 2014 at 5:39 am

Selection bias on both sides?

If results (and complete methods) were infinite would we have this problem? If egos were minimal?

Knowing why an experiment doesn’t work seems useful.

Reply

andrew' July 8, 2014 at 5:43 am

Response from original author: “Well, you didn’t wear MY lucky underwear that I haven’t washed (or changed) since I got the experiment to work one time!”

Reply

ummm July 8, 2014 at 6:01 am

Nicolas Taleb, Gladwell, and Daniel Kahneman present the mundane or obvious as revelations, with ‘evidence’ to support their own confirmation biases that all humans are equal, even though biologically they aren’t. The overarching leftist theme in these books is that the the high -IQ experts are wrong and cannot see the forest for the trees, although the concepts of tail risk are not new to the people that actually study such matters. The bulk of these books is pseudoscience, over generalizations, and anecdotal evidence.

Reply

ummm July 8, 2014 at 6:03 am
ummm July 8, 2014 at 6:04 am

Nicolas Taleb, Gladwell, and Daniel Kahneman present the mundane or obvious as revelations, with ‘evidence’ to support their own confirmation biases that all humans are equal, even though biologically they aren’t. The overarching leftist theme in these books is that the the high -IQ experts are wrong and cannot see the forest for the trees, although the concepts of tail risk are not new to the people that actually study such matters. The bulk of these books is pseudoscience, over generalizations, and anecdotal evidence..

Reply

Anon. July 8, 2014 at 6:45 am

If this guy is at Harvard, one cannot help but wonder how bad the situation is at “lesser” institutions.

Reply

ummm July 8, 2014 at 6:50 am

Harvard places great emphasis on high-IQ at admissions, but my guess is that the research output is probably no better or worse than anywhere else. The calculus PDFs at Miami U are not much different than the MIT ones.

Reply

Marian Kechlibar July 8, 2014 at 10:28 am

High IQ is no defense against toxic memes.

My favorite example: ayatollah Khomeini was said to be a very smart guy.

Reply

Anon. July 8, 2014 at 6:47 am

Also I can’t believe this idiot is trying to cite Quine in support of this ridiculous thesis.

A little knowledge is a dangerous thing.

Reply

David H. July 8, 2014 at 11:49 am

He also cited Popper, but went on to say, in all seriousness:

“Thus, negative findings—such as failed replications—cannot bear against positive evidence for a phenomenon.”

I can’t help but think of Kevin Kline’s character in A Fish Called Wanda, who very confidently thought he understood Nietzsche.

Reply

mofo. July 8, 2014 at 4:01 pm

Apes dont read philosophy.

Reply

Steve Sailer July 8, 2014 at 7:21 am

The fundamental problem with the human sciences at present is that what is most replicable is that which is most forbidden to acknowledge: that what consistently matters, time and again, are genetic differences represented by race and sex. Everybody kind of deep down knows that’s true because the evidence is so overwhelming. But we gang up and punish people who come out and say it, like Larry Summers, James D. Watson, and Jason Richwine.

Ruling out the really big factors from the human sciences has two effects: fields like social psychology are reduced to trivialities like … can you prime college students to walk a little bit slower one time? Worse, the field is morally corrupted by the demands to be oblivious to the obvious and punish the honest. Not surprisingly, it therefore attracts conmen like Staepel and apparatchiks like Mitchell.

Reply

Rahul July 8, 2014 at 7:35 am

So if we let them focus on sex & race, the conmen would go away?

I really don’t thing this sort of crappy research has much to do with either sex nor race.

Reply

Steve Sailer July 8, 2014 at 8:52 am

Professor Mitchell strenuously defends the virtue of Stereotype Threat against those who can’t replicate it.

Reply

ummm July 8, 2014 at 7:36 am

The TED effect aggrandizes the mundane and stultifies the truth. So we get this mealy-mouthed exchange of politically correct ideas that adds nothing to our repertoire of understanding.

Reply

Steve Sailer July 8, 2014 at 9:33 am

It’s the interplay of political correctness and corporate lucre that’s been particularly damaging to the human sciences. How much money does Malcolm Gladwell make per IQ point addressing sales conventions v. how much does Charles Murray make? Obviously Murray would have more valuable things to tell corporations than Gladwell has, but how many corporations would invite Murray because he is “controversial.” In contrast, Gladwell is a beloved dimwit.

Reply

Art Deco July 8, 2014 at 10:02 am

Did they invite James Flynn?

Reply

Brian Donohue July 8, 2014 at 9:45 am

I disagree. You are fascinated by group differences. I’m much more interested in human universals and what this science can tell us here, which is a lot, but not the sort of stuff that gets you going.

Reply

Steve Sailer July 8, 2014 at 9:55 am

The group differences are science, in the sense of replicability. The priming stuff is, at best, cultural. And culture changes over time, so even when the original experiment wasn’t careless or fraudulent, it’s likely to eventually stop replicating as people get bored by whatever primed them in the past.

Reply

Brian Donohue July 8, 2014 at 10:01 am

Yeah, I read your theorizing. No offense, but Kahneman is better.

As far as group differences, you see where this is heading, right? I generally don’t see the usefulness of this information in virtually any interaction I ever have with any individual. Once the goal of neutralizing ‘disparate impact’ policy is achieved, all that will be left is awkward rancor, calling for tact. Prepare for that day.

Reply

Steve Sailer July 8, 2014 at 11:06 am

“I generally don’t see the usefulness of this information in virtually any interaction I ever have with any individual.”

There’s this thing called “public policy.” For example, immigration policy.

There’s a general bias in today’s culture in favor of ignorance. Personally, however, I subscribe to the motto of Faber College in “Animal House:” “Knowledge Is Good.”

Brian Donohue July 8, 2014 at 11:36 am

When you argue against ‘disparate impact’, you have my sympathy. But your vision of, I don’t know, a table of races with a list of things like IQ driving immigration policy, you isolate yourself with kooks.

Marian Kechlibar July 8, 2014 at 10:04 am

Can you study universals without studying differences at the same time? These two are logically complementary. Given that we’re far from being sure what is universal and what is not, your activity would probably consist of unearthing group differences all the time.

Reply

Brian Donohue July 8, 2014 at 10:08 am

Yes of course. My chief interest here is understanding human nature, not so much the various appendices.

Reply

Marian Kechlibar July 8, 2014 at 10:18 am

Aren’t humans too diverse to have common nature?

I mean, you will find, say, misers, in all corners of the world, but is it going to be “useful of this information in virtually any interaction with any individual”?

Besides, you will most likely find that even frequency of misers is location-dependent.

Brian Donohue July 8, 2014 at 10:24 am

No silly. Of course, humans have common nature. So do mammals. So do tetrapods. So do chordates.

Now, human nature encompasses a lot of diversity. My IQ is 30 points higher than my brother’s, even though we share half of our genes!

Marian Kechlibar July 8, 2014 at 10:26 am

Maybe you are right, but if humans have common nature, then I am unable to find it. Every rule I can think of has too many anecdotal exceptions – just from my short life.

Brian Donohue July 8, 2014 at 10:30 am

Bipedalism, facility with language, extraordinary manual dexterity, ability to step outside the moment and plan, difficulty in reasoning about very small or very large quantities, etc. etc.

Steve Sailer July 8, 2014 at 11:45 am

But people aren’t really that interested in bipedalism as a topic of continuing interest, as opposed to things like, “Is my country going to win the World Cup?”

Edward O. Wilson had some great lists in “On Human Nature” back in the 1970s about behavioral traits that humans share in common with ants and of traits that humans don’t share in common with ants. I used them in conversation frequently for several years, until I noticed that most people weren’t really that interested in human universals (not to mention ant universals), and kept switching the topic on me to things like, “So, who do you think’s going to win the big game?”

Brian Donohue July 8, 2014 at 11:50 am

C’mon Steve, you see my progression to “difficulty in reasoning about very small or very large quantities etc etc.” I mean, you said you read Kahneman. My rhetorical point in the comment was to start with obvious human universals, to assist Marian, who couldn’t think of any. My charitable interpretation is that you are being obtuse rather than mendacious here.

Steve Sailer July 8, 2014 at 12:11 pm

Who besides economists didn’t know that most people aren’t all that good about reasoning with big or small (or, for that matter, medium-sized) numbers? Kahneman’s puzzlers are extremely reminiscent of Ripley’s Believe or Not comic strips from over a half century ago. When I was a kid, I loved all this tricky stuff and optical illusions that wound up in Thinking Fast and Slow 45 years later. They should have given Ripley the Economics Nobel.

Brian Donohue July 8, 2014 at 12:21 pm

Right there, when you add “(or for that matter, medium-sized) numbers?” you reveal your ignorance of the point Kahneman is making, which underscores shortcomings in fast reasoning outside of a ‘normal’ range. But yeah, I think your Kahneman elevator speech pretty much covers his decades of research. Carry on.

Rahul July 9, 2014 at 5:02 am

@Brian

The Sailer school of thought relies heavily on anecdotes, common sense & nostalgia as a substitute for hard facts & quantitative analysis.

Thank goodness he peddles in sociology & not bridge building.

Steve Sailer July 8, 2014 at 11:09 am

“Can you study universals without studying differences at the same time?”

Right, it’s like saying, “I’m going to build a digital computer that only uses 1s and doesn’t notice 0s.” Knowledge requires contrasts.

Reply

Brian Donohue July 8, 2014 at 11:30 am

Very weak analogy. Lack of good faith suspected.

Steve Sailer July 8, 2014 at 11:46 am

Brian,

Some people have thought about this kind of thing along time before you ever did. You could learn from them.

Brian Donohue July 8, 2014 at 11:53 am

Steve,

No doubt. With all due respect, you are not one of them. I’m sure I can benefit from learning from others here. How about you?

Rahul July 8, 2014 at 12:06 pm

“There’s a general bias in today’s culture in favor of ignorance”

As compared to when? Exaggerating a bit Steve?

You really stop at nothing when you want to make your pet point, do you?

Steve Sailer July 8, 2014 at 12:15 pm

As compared to, say, the 1950s into the 1970s. Well-educated people expressing pride in their ignorance about important topics is much more common today.

Rahul July 8, 2014 at 12:43 pm

I disagree. But I’m not sure how we settle this. I think we as a population are far more knowledgable today than 50 years ago.

Steve Sailer July 8, 2014 at 1:32 pm

You’re missing the point: people boast more today about being unworldly, in the past they liked to appear more clued-in.

Here’s a good example where the same question was asked in a survey many decades apart: as Joel Stein notes, the facts haven’t changed, but the public claims to be less aware of reality in the 2000s than in the 1960s:

http://isteve.blogspot.com/2008/12/joel-stein-asks-how-jewish-is-hollywood.html

Silas Barta July 8, 2014 at 6:05 pm

How dare you criticize my plan for a unary computer, fascist!

S July 8, 2014 at 7:49 am

“Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them”

I call bullshit. Failed replications give you some information, even if they are not flawless, as the original study was not flawless either. The more perfect an experiment has to be to replicate an effect, the less likely it is that the effect is real and meaningful. Lets say you have 5 failed independent replication attempts of a “true” effect. Doesnt that tell you something about the size of the effect?

Reply

Marian Kechlibar July 8, 2014 at 10:10 am

To your last question: not really, the replicators might have made the same mistake. For random mistakes, that would be very improbable, but for culture-specific biases etc., that could happen very easily.

To the topic: it is personally understandable that no one wants to see their own results debunked. But the main catch is politics. If long-term dogmas such as the above-mentioned Stereotype Threat are ephemereal, livelihoods of many are at stake; and credibility of others as well. This is bound to create a storm, and crazy articles like the one above are the mildest what could happen. In times of Galileo and Bruno, the powers-that-be reacted by tying the heretics to another type of stakes. We have progressed a bit since then, at least in this regard.

Reply

Steve Sailer July 8, 2014 at 11:10 am

People in America are only forced to resign.

Reply

Art Deco July 8, 2014 at 8:42 am

1. Assemble a bibliography of the man’s writings.

2. Have a look at Web of Science and see who has cited his writings. I’ll wager you’ll be able to build a bibliography of people whose work vitiates his, and quite a lot of what he’s spent his adult life studying.

Reply

bellisaurius July 8, 2014 at 9:53 am

This topic reminds me of my philosophy on instrumentation: A man with one clock always knows what time it is.

A man with two clocks is never quite sure.

I get why replication matters, but if one is trying to do the exact same experiment to confirm a hypothesis, then all your doing is giving yourself a bigger dataset. To be sure about something you need to have a secondary means of testing that verifies the hypothesis independently.Or, to put it in nonegineering terms, if I see something that looks like an apple, I might apply my tatste buds or nose to confirm that indeed, this is an apple. Two unrelated methods are going to be a lot better then just looking closer at the same darn apple.

Reply

Roger Sweeny July 8, 2014 at 10:02 am

Neurosceptic, who is one of the people Mitchell dislikes, has a thoughtful reply:

http://blogs.discovermagazine.com/neuroskeptic/2014/07/07/emptiness-failed-replications/#.U7v51LEUqSo

Reply

Tracy W July 8, 2014 at 10:05 am

He does have a bit of a point – eg we know that radios work, but give a first year engineering school the job of building a radio, and you’ll get a lot of radios that don’t work, because replicating a radio is hard at first. The trainee engineers will have to spend a bunch of time debugging and rebuilding to get a working radio.

On the other hand, that someone has actually built a working radio can be checked far more easily and objectively than the outcome of most psychology lab experiments.

Reply

ChrisA July 8, 2014 at 10:42 am

Tracy – as this is thread is just about the most consensus I have ever seen on MR making it rather a boring echo chamber I appreciate your attempt to come up with something contrary. But really, replicating radios wasn’t very hard to do for lots of people once they had been invented. In fact there were large numbers of people building radios not too long after the first ones were made. Why – because it was very useful technology. To continue this analogy, if some of these psychology results were really significant they would be replicated very quickly as people would be highly motivated to do so as lots of money could be made off them. We wouldn’t be debating if they really existed, we would be using them.

Reply

Steve Sailer July 8, 2014 at 11:14 am

Or if they are really good at priming people into doing things, they go make a lot of money in advertising or entertainment or motivational speaking or management or whatever.

But even that kind of talent doesn’t replicate all the time. Rob Reiner directed seven hit movies in a row to start out his career. That streak ended a couple of decades ago.

Reply

Tracy W July 8, 2014 at 11:22 am

Sure, but debugging a not-working radio is a lot easier than debugging a psychology experiment. Breadboards stay put and don’t complain about being bored or disappear to go to the loo, unlike human subjects.

And even if people could make money of them, would they make money off them by replicating them for scientific papers?

Reply

Steve Sailer July 8, 2014 at 12:19 pm

What’s actually amazing is how much replication there has been of extremely unpopular findings, such as racial differences in intelligence, the things that Jason Richwine was fired from a Republican thinktank for discussing in his Harvard doctoral dissertation. All the incentives in the world exist to debunk this, but nobody has been able to do it. Anybody who could come up with an IQ test with predictive validity that doesn’t have racial gaps would be a superstar, and it’s been like that for a half century. Yet, the findings of racial gaps just replicate endlessly.

That’s science.

Reply

Andrew' July 8, 2014 at 12:11 pm

ChrisA,

I also said he has a point, though not in so many words.

Reply

Rahul July 8, 2014 at 12:39 pm

@Tracy:

I think your analogy is a bit off: Say 150 years ago a guy came to you claiming he had invented a radio. You’d be incredulous so immediately a bunch of you started to build radios using his instructions. You kept failing for years.

Would you believe he had indeed invented this fantastic thing called a radio?

Priors matter. Occasional failure in an established protocol needs to be distinguished from total failure to replicate the first & only report.

Reply

Tracy W July 8, 2014 at 3:37 pm

Probably not. But if I then concluded that radios couldn’t exist then I’d be wrong. Because radios do exist.

Again the problem with applying this to psychology is that a working radio is easier to check the existence of than a psychology experimental result.

Reply

Ballab July 8, 2014 at 10:29 am

Just think like a bayesian, bro.

Reply

CMOT July 8, 2014 at 10:46 am

You’ve got to keep in mind how social science reasearch is done.

Tiny, highly unrepresentaive samples of people are placed in extremely artificial situations. The results are judged and coded by the researchers themselves, who are highly invested in finding ‘good’ results. Trials are run multiple times, often dozens of times, until ‘good’ results are achieved.

Replicators are not nearly as invested in success as the original investigators, so they are running fewer trials and coding the results with less concern for success.

Mitchells’s statement about “the inadequate basis for replicators’ extraordinary claims” gives the game away. Other sciences treat all results as ‘extraorinary claims” requiring proof. The purpose of social science research is to validate the RESEARCHERS, not ideas. So you can see why he’supset.

Reply

a Michael July 8, 2014 at 10:50 am

+1

People need to get over themselves and those doing the replications should assume the best in whomever they’re replicating. We all know mistakes happen. Research is hard.

Reply

Brian Donohue July 8, 2014 at 10:56 am

“those doing the replications should assume the best in whomever they’re replicating.”

Nope. Those doing the original research should do everything in their power to disprove any conclusions themselves, and replicators should bring nothing but skepticism to the table. That’s what science is.

Reply

CMOT July 8, 2014 at 11:02 am

You’ve hit the nail on the head.

If you’re not allowed to question it, it isn’t science any more.

Reply

mb July 8, 2014 at 11:22 am

The replicators should be very careful about their biases affecting the results in the negative direction. I think this is a fair point by Mitchell.

Reply

Steve Sailer July 8, 2014 at 12:03 pm

Or it could be that the secret sauce of the original experiment isn’t included in the recipe. For example, I’ve long assumed that Stereotype Threat is simply the result of researchers priming black students through all sorts of subtle and not so subtle hints that they don’t want them to work hard on this no-stakes test. Hey, don’t kill yourself on this black students. It’s just a meaningless little hard test.

So, maybe a no-BS guy like John List can’t replicate Stereotype Threat because students feel that you should do your best on anything a true scientist like Professor List wants you to do.

Who knows?

Jeff R. July 8, 2014 at 12:50 pm

(To Steve Sailer)
Wait, shouldn’t the people administering the tests have had absolutely no idea what the experiment was actually about? Or are you saying that the trials aren’t done at least double-blind, which would be an even stronger argument to their worthlessness than this replication business, which is saying a lot…

Bill July 9, 2014 at 1:16 pm

@ Jeff R

Are you serious? You think that maybe social psych experiments are double blinded? Wow.

a Michael July 8, 2014 at 1:04 pm

By “assume the best,” I mean, let’s not impugn people’s intentions — e.g., accuse them of purposely rigging the experiment or engaging in data mining.

Yes, be skeptical of the results and the methods, but going around accusing people of foul play will only discourage people from embracing replication, which prevents us from moving from the current equilibrium (of I won’t replicate you if you don’t replicate me) to a more ideal one where replication is the norm and not seen as a personal insult.

Reply

Eric Hammer July 8, 2014 at 1:37 pm

I agree, and would only add another step: experiments that have negative (null hypothesis) results are shelved, because they probably won’t get published, which is largely the point of the exercise. (Unless you have large cash grants, which makes the point getting the results the grant awarders want to see.)

Reply

Bemac July 8, 2014 at 10:47 am

I used to think anecdotes weren’t data.

Now I hear the data are just anecdotes about what happened this one time somebody did an experiment.

Reply

a Michael July 8, 2014 at 10:48 am

Addendum: “The push to replicate findings could shelve promising research and unfairly damage the reputations of careful, meticulous scientists, says Mina Bissell.”

Why does it have to unfairly damage reputations? If you find a result due to random luck, it shouldn’t be “damaging” or insulting for someone else to not find that same result. Certainly, many findings won’t replicate also due to mistakes on the part of the researcher, but can’t we be grown ups about this? Let’s assume the best in people and be open to the fact that even the most most careful, meticulous scientist will make mistakes. Surely, every scholar is well aware of how easily things can go wrong, especially in complicated experiments with human subjects or in observational analyses involving massive datasets.

What we need is to develop a healthy, pro-replication culture — and get over our egos.

Reply

Steve Sailer July 8, 2014 at 11:34 am

I’m struck by how social psychologists fail to make obvious theoretical arguments excusing their replication failures: like, humans in social situations often change. Historians aren’t asked to replicate World War I to prove their theories. People learn from their mistakes, or they forget the lessons of the past, or they just get bored, or whatever. People in groups change.

If you can’t step into the same river twice, you sure can’t experiment in the same society twice. For example, last spring I ate in a diner near Columbia U. and two undergrads were having a date in the next booth. The boy talked endlessly at the girl. He was quite annoying, but his 15 minute-long demolition of the ideas of Malcolm Gladwell was on the money. If that kid got roped into an attempted replication of the Bargh experiment, he’d probably tell the other subjects, “Hey, this is that stupid priming experiment in “Blink!” Let’s run down the hallway to the elevator just to screw with their heads.”

Reply

Dan July 8, 2014 at 11:03 am

So the external validity of the experiments he is talking about is so low that they can’t even be replicated in a similar lab situation…nice…

Reply

JasonL July 8, 2014 at 11:19 am

“Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.”

I mean, what? A person who writes this is a person who cares not a bit about the scientific method. Whose extraordinary claims again? Could anyone ever argue more persuasively that humility and rigor is desperately needed?

Reply

replication rules July 8, 2014 at 11:23 am

More from John List on this issue (One Swallow Does not Make a Summer: New Evidence on Anchoring Effects,” American Economic Review, (2014) Volume:
104 Issue: 1 Pages: 277-290.)

In the model we show that the common benchmark of simply evaluating p-values when determining whether a result is a true association is flawed. Two other considerations—the
statistical power of the test and the fraction of tested hypotheses that are true associations—are key factors to consider when making appropriate inference.
The common reliance on statistical significance as the sole criterion leads to an excessive number of false positives. The problem is exacerbated as journals make
‘surprise’ or counter-intuitive results necessary for publication. But, by their very nature such studies are most likely not revealing true associations—not because of
researcher malfeasance, merely because of the underlying mechanics of the methods.

While this message is pessimistic, there is good news: our analysis shows that a few independent replications dramatically increase the chances that the original
finding is true. As Fisher (1935) emphasized, a cornerstone of the experimental science is replication. Inference from empirical exercises can advance
considerably if scholars begin to adopt concrete requirements to enhance the replicability of results, as for instance starting to actively encourage replications
within a given study.

Case seems pretty clear–these words are spot on from List.

Reply

Yancey Ward July 8, 2014 at 11:35 am

I don’t think I have read an overall dumber argument. Not all attempts at replication do so with the intent of disproving the original finding, and one can’t simply ignore the fact that the original experiment is just as likely to have been conducted imperfectly.

Reply

James Oswald July 8, 2014 at 11:58 am

The writer appears to be more concerned with preserving the status of the original researchers than discovering truth. I’m glad this article was torn apart by the comments section.

Reply

dwb July 8, 2014 at 12:27 pm

Please don’t bother me with silly things like replication, they may diminish my research funding!

If it is *that hard* to replicate an experiment, then I question the usefulness of the result. Sure, under highly controlled laboratory experiments I can get a day-old dead turkey to spontaneously combust. Not very useful, those conditions don’t exist in the real world 99.9999% of the time.

Replication (or lack therof) with some random error in the protocol demonstrates how sensitive a result is to the initial conditions and protocol. Moreover, we need to know if the result itself was real (i.e. cause-effect), or mere luck.

If you cannot replicate a result, it’s probably luck, or the result is so sensitive to initial conditions is worthless anyway. Either way, please don’t gripe about this science thingy getting in the way or your $$.

Reply

Gordon Mohr (@gojomo) July 8, 2014 at 1:45 pm

Shorter Professor Mitchell: The results of the first researchers to tease-out an interesting result are woven of the most magnificently fine threads. Later researchers who can’t see the same effects must be unfit for their jobs or unusually stupid.

(Also, please keep any children with less than 13 years of formal schooling away from the laboratories.)

Reply

Richard July 8, 2014 at 2:53 pm

Has any looked into whether this paper is a hoax? It sure sounds like one.

Reply

Dan Weber July 8, 2014 at 6:49 pm

Sokal?

Reply

bxg July 8, 2014 at 7:58 pm

What consensus!

Except, it seems, our host: TC: “I don’t quite buy it but a useful counter-tonic”

An argument you don’t _quite_ buy usually has some substantial merit, just not enough to buy. I love it if TC would
say a few words as to what the merit was. And in spite of it’s deficiencies, it’s _useful_? Any chance TC could hint
at a use[*] or two?

[*] I don’t think the use he has in mind is: give to a student and challenge her to find as many logical fallacies or instances of bogus
argumentation as she can.

Reply

Bill July 9, 2014 at 1:32 pm

There are multiple ways to read TC. You could read him as Steve Burton does, as a poseur constantly in way over his head: searching for a way to say something without embarrassing himself, plagued in this task by the fact that he never knows what he is talking about. Alternatively, you could read him as Steve Sailer does, as a cynical courtier in the Palace of PC: showing us that he really knows what is going on by the delicate arch of his eyebrow.

It’s up to you. Did TC screw the pooch by saying the (utterly inane) vaguely positive thing he did? Or is TC showing us he knows what a pile of crap social psych is by the act of linking such a transparently idiotic argument at all? And is the inane positive comment then just the price of staying a courtier in the Palace of PC?

Reply

Popeye July 10, 2014 at 12:20 am

Mitchells’s statement about “the inadequate basis for replicators’ extraordinary claims” gives the game away. Other sciences treat all results as ‘extraorinary claims” requiring proof. The purpose of social science research is to validate the RESEARCHERS, not ideas. So you can see why he’s upset.

CMOT wins. There’s a ton of pressure on researchers to produce publishable results, there’s much less pressure to produce genuine understanding. Successful researchers are the champions of producing results. Ideally they would also be the champions of producing and furthering understanding, but Mitchell makes it clear where he stands.

Reply

Leave a Comment

Previous post:

Next post: