Walking Fast and Slow

In a famous paper psychologist John Bargh and collaborators gave students at NYU a test very similar to that described by Malcolm Gladwell in Blink:

In front of you is a sheet of paper with a list of five-word sets. I want you to make a grammatical four-word sentence as quickly as possible out of each set. It’s called a scrambled-sentence test. Ready?

  1. him was worried she always
  2. are from Florida oranges temperature
  3. ball the throw toss silently
  4. shoes give replace old the
  5. he observes occasionally people watches
  6. be will sweat lonely they
  7. sky the seamless gray is
  8. should not withdraw forgetful we
  9. us bingo sing play let
  10. sunlight makes temperature wrinkle raisins

The students were then sent to do another test in an office down the hall. Unbeknownst to them, walking the hall was the real experiment. Scattered in the sentences above are words like “worried,” “Florida,” “old,” “lonely,” “gray,” “bingo,” and “wrinkle.” Bargh reported that students who had been primed with these words took significantly longer to walk down the hall than those not primed with the “old” words.

In the original study there were only 60 participants and the subjects were timed with a stopwatch. A new paper doubles the sample size and uses more accurate infrared sensors. You will probably not be surprised to learn that the new paper fails to replicate the priming effect. As we know from Why Most Published Research Findings are False (also here), failure to replicate is common, especially when sample sizes are small. I haven’t yet described the real surprise, however.

The authors of the new paper, Doyen et al., then took the experiment meta; they ran the experiment again but this time they told half the people supposedly “running” the experiment that they expected the participants to walk slower and the other half they told that they expected the participants to walk faster. (A confederate provided evidence for this effect.) In the second experiment they again used the infrared sensors but they also asked the nominal experimenters to use a stopwatch as the sensors were said to be new and sometimes unreliable.

In the second experiment Doyen et al. were able to replicate the Bargh results. Namely, when using the stopwatch, the nominal experimenters reported that the group primed to walk slow did walk slow and they reported that the group primed to walk fast did walk fast. The results, however, were not entirely due to subtle experimenter bias because in the slow prime case the infrared sensors also found that the slow-primed group walked slow. The infrared sensors, however, did not report an increase in speed when the nominal experimenters expected an increase in speed.

Thus, the old-slow priming results appear to be due to a subtle mix of experimenter bias and standard priming which is cued or amplified via experimenter signaling. Given what are still relatively small sample sizes (50-60) the last should also be taken provisionally.

Important Addendum: Bargh has written a nasty attack on the new paper, the journal that published the paper, and Ed Yong who blogged the new paper for Discover Magazine. Bargh’s attack is a model of how not to respond to criticism new information. Ed Yong discusses Bargh’s response here. Like Yong, I am dismayed that Bargh quotes the new paper inaccurately. In his attack, Bargh also says things such as the overuse of elderly-related items reduces the effect of the prime. Yet in the methods paper he cites (and wrote) he says more prime stimuli generally results in bigger effects (p.11, effects can vary if the subjects consciously recognize the prime, a factor that the new paper tests). Bargh also entirely glosses over the main point which is that the authors did find priming effects when the experimenter knew and expected the effect to occur. Note that given the subtlety of the effects any experimenter bias appears to be entirely unintentional and Doyen never argue otherwise.


"Bargh reported that students who had been primed with these words took significantly longer to walk down the hall"

And someone, somewhere, actually believed that?

That's ranks highly among the dumbest things I have ever heard.

The Bargh paper seems gated. How much was the speed change they report. What was their significance statistic?

Don't confuse "weird" with "dumb." Priming itself is a very real effect: http://en.wikipedia.org/wiki/Priming_(psychology)

Watch out, Eric Barker!

The scrambled sentence test also won several poetry awards.

Thread winner.


That was the best laugh I have had in the last week.

Funny, my main reason to visit Florida in the past couple of decades has been to escape the icy North to watch and play tennis. The words in the test that caught my eye were:

Florida temperature ball throw toss shoes observes people watches sweat play let sunlight

The Doyen et al. study does not cluster standard errors at the individual level, and so it overstates the significance of Experiment 2.Clustering standard errors allows for the possibility of intraexperimenter correlation (e.g. some people are faster at the stopwatch), which may still exist, despite not being statistically significant in a sample of only 5 observations.

What if the Bargh hypothesis was true? What if people really did walk more slightly more slowly right after encountering a bunch of "old" words (like Florida, for instance). Why would it be important to know that?

That's what I was thinking. The title of the post could have been "ZMP research".

It could be very important...

Think about the marginal cost of producing this versus reducing queue size in various places...

One of the hottest areas in experimental psych right now is the study of the effects of subconscious priming in many, many different contexts, for example:


The Bargh study is one of the foundational experiments in that paradigm. That's why it's potentially a big deal. On the other hand, the phenomenon has been reproduced by so many experimenters in so many different situations, that the failure to replicate the 'walk like an old person' results may not have that much impact on the field (though the effect on Bargh's reputation might be greater).

phlogistan was replicated many times also. The failure to reproduce it was seminal in disproving it.

By no means am I defending this shoddy research, but pruning is closely related to framing which is important in behavioral finance and economics.

Thinking about replicating or out-meta-ing the Doyen et al paper makes my head spin ...

One interesting factoid was that PLoS-One published 14,000 articles in 2011 charging $1,350 per article. That's almost 19 million dollars.

Not bad for an online publisher. How much could their costs ever be? Must be a fantastic profit margin.

I shouldn't even put too much thought into the original paper, but people supposedly slowed down their walking pace through a hallway? Did they clutch at their back? Stoop over as they walked? Forget where they were going?

Pay attention to the title of the post. The new book my Khaneman, thinking Fast and Slow, reports the Bargh research.

the paper would be important in developing a general theory of how system 1 "unconsciously" controls the "conscious system 2".

The new paper (in PLoS) doesn't show that the effect size they observed is inconsistent with the effect size in the original paper. Whatever you want to call this (such as "failure to replicate") it is not impressive.

The original paper (available ungated here - http://www.yale.edu/acmelab/articles/bargh_chen_burrows_1996.pdf) stated that the primed group had an average walking time about 1 second longer than the non-primed group (primed group - . (refer to "Results" section on page 237, which is page 8 of the pdf linked to above) The results section of experiment 1 in the failure-to-replicate paper stated the average walking time for the primed group was 6.27 s, where as it was 6.39 s for the non-primed group. (The primed group actually walked quicker than the non-primed group by about 0.12 s, but this difference is a statistically insignificant result.)

Though you are correct that "the new paper (in PLoS) doesn't show the effect size they observed is inconsistent with the effect size in the original paper," that is only because the failure-to-replicate paper did not make that comparison, which would have shown that the effect size they observed is inconsistent with original paper. I suspect that the failure-to-replicate paper only reported the statistical significance results, and not the effect size results, because that is the scientific standard for establishing that a result did not arise by chance.

What's the accuracy you trust a person using a manual stopwatch? Can a 1 second difference as measured by a stopwatch be significant at all? Aren't they mostly tracking the perceptual speed and response of their experimenter mostly?

I was just reminded of the original paper while reading Khaneman's book. It seemed like an odd result. This is why it's important to pay attention to methodology: when I know there is room for (unintentional) experimenter bias, I do a better job of being properly skeptical of reported results.

Are there any journals (in whatever field) that try to specifically attract attempted replications of previous experiments? It seems like it would be a useful way to encourage more attempts at replication, which would be generally good for any experimental field.


However, as Chris says below, there is further bias because of retaliation.

I am a professional research psychologist at a major university and I have heard from more than one lab that couldn't replicate the original study. These labs didn't publish the results because the costs of publishing a failure to replicate are high (read: pushback from the authors of the original paper) and more importantly the rewards are low: most of the top-tier for-profit journals are not interested in null results, even when they are failures to replicate.

Alex - a worthwhile message to push from your pulpit: granting agencies should reward researchers who publish null results and researchers who submit to journals that do not discriminate against null results (such as PLoS ONE, where this study was published).

The reason why such researchers aren't rewarded is most academics hate rocking the boat. Such researchers are percieved as troublemakers.

What if you hire them and they publish a non-replication of your work?

This is one of those things that make me wonder "if that's not what they do, then WTF are they doing?"

Maybe we need to change the reward and incentive system to retest prior research.

How about a Journal of Non-Replicated Research. You could have some good board members so you could attract written articles.

You could have awards and recognition.

Think of it as the Hunger Games.

They have been many attempts at journals dedicated to null results, but none has ever really caught on. That's why granting agencies need to flex their muscle and reward scientists who submit to them. Without the extra funding boost, the incentives just aren't there.

What you don't need to do is reward people for seeking rewards. They'll seek rewards for free. So, to the extent that grant agencies are reinforcing biases then we need them like a hole in the head.

I think you misunderstood me. The current incentive structure does not reward publishing null results. It's just not worth the effort for many of us scientists. I want the funding agencies to counterbalance this.

How many other papers in that field are going to turn out to be equally bogus?
P.S. Isn't the source of the problem rather reminiscent of Clever Hans, the counting horse? He was a much-quoted warning to psychology experimenters 100 years ago. People don't learn much in some fields, it would seem.

You won't learn if not learning is highly profitable to you.

Hey they did progress from priming horses to priming college kids.

I would think priming horses would be quite a bit harder, so that doesn't seem to be progress.

Perhaps, someone could do the same experiment with a horse trained to kick a sensor when someone crossed a line in the hallway. ;)

Reorder or make sentences with the following words:

1. Marginal Revolution data not biased is graphs?

2. Inconsistent liquidity trap Austrian theology monetary policy and tax cut more deficit?

3. Koch Mercatus funding Cato et tu?

You have now been primed to go to the Brookings, Vox, Economist View, Menzie Chin Websites.

Walk carefully.

Hee! Brilliant.



The Voodoo science at its best.

My take on all this is a bit different. Here's my blog post on the same subject http://bit.ly/xEvqTN and what we can do about all this going forward

Shouldn't some credit be given to khaneman and taverski for their work in this area? They only won a Nobel prize for their work on priming and other biases. Seems their research may be more robust that whoever gladwell's stealing from this time.

In any case, it is time for my sponge bath. Matlock comes on at 3, and the cat has hidden the remote again.

What are the odds that three separate posters would all misspell Daniel Kahneman's name the same way? Calculate a p value and try to replicate.

The later two were primed.

The Bargh finding is known not to replicate, see here.

I think the whole experiment a bit silly, but I personally didn't find Bargh's response offensive.

The only thing I thought was offensive about Bargh's response is the headline, and, at least in print media, columnists don't even write those.


As an old marketing researcher, it was a standard dictum in the field that you can get some people to respond how ever you want them to respond as long as it's easy for them. If you want them to walk slow, give them clues that you want them to walk slow.

Same with the famous Stereotype Threat. If you want black students to slack off on a meaningless low stakes test, that's not hard to arrange.

As for Kahneman, his Nobel discovery that it's easy to fool people was already known by stage conjurors, conmen, the creators of optical illusions, and many, many others.

I assume the conclusion is "psychology has too much research funding"?

I am not sure if the Bargh post is as 'nasty' as you state, but he does raise some concerns about the differences in testing procedure with references.

I am unable to make any grammatically correct sentences out of those word sets. I also need to lie down and rest.

