Further assorted links

Comments

The cat laps four times a second — too fast for the human eye to see but a blur — and its tongue moves at a speed of one meter per second.

The human eye sees nothing but a blur at 2mph?

A screaming comes across the sky...

As for the New Scientist article, it finally allows us to use all those perfect foresight models we have been developing.

It has happened before, but there is nothing to compare to it now.

I'd put my money on what Rob said.

IANALinguist, but at least the southern hemisphere accents at #3 don't seem very representative of the locale. Particularly the Sydney accent - seemed stagily broad. And the only specifically New Zealand phoneme I could discern after clicking on the Shaky Isles on the map was in the guy's use of the word "red". Perhaps these things are affected by self-consciousness on the part of the speaker.

If #2 is true it is the overthrow of causality. Very unappetizing. I hope for sanities sake that either their statistical methods are bad or their data is a fraud.

2. Feynman wrote about the likely flaws in this type of study.

"I purposely waited until I thought there was a critical mass that wasn't a statistical fluke," he says.

Like To above, that comment caught my attention as well for the irony that it essentially proves the opposite of what it claims. It is precisely because they are indeed statistical flukes that he had to wait eight years to collect enough of them to pass even the very low threshold that is peer-reviewed psychology.

As everyone here certainly knows, if you conduct over that eight years hundreds of studies at even a 95% CI you'll have plenty of "successes" to choose from even when there is absolutely nothing there. The commenter above doesn't go far enough -- this isn't "close to" cherry picking, it is cherry picking.

Of course he'll have statistical ignorance on his side again if enough people try to recreate his experiments. Obviously the (vast) majority will fail (most of whom we'll never hear about), but if there are hundreds of people looking he can be assured of tens of "successes" to point to as vindication (while, as To points out above, claiming any failures were somehow just not sensitive enough or done incorrectly or whatever other rationalization he cares for).

Beyond that, if one is going attempt what amounts to overturning a massive body of centuries of rigorous scientific study he ought to come up with something better than that after enough tries over eight years he managed to get 53% of students to guess a coin toss correctly. A more thoughtful "scientist" would take contradictory evidence as a sign that he should try harder to find an explanation consistent with existing theory, particularly when the theory is as well-examined and fundamental to all of natural science as causation. Of course if he did that he'd find that, indeed, there is a very simple and consistent explanation: pure chance. Unfortunately finding that explanation doesn't help make a name for yourself or get you published (even in psychology) so there is little incentive to do so, and conversely, quite a bit of incentive to make extraordinary claims.

At any rate, since James Randi is still offering a million dollar prize to anyone who can rigorously demonstrate something like this, let's see if this guy puts Randi's money where his mouth is and gives it a try. Something tells me he'll have some excuse like all the rest about how he's not in it for the money (then donate it to charity!) or some such nonsense.

"Do you have a link or a reference to where Feynman mentioned the flaws? Or do you remember what his criticisms were?"

After reading the paper and the methodology it seems Feyman's criticisms probably don't hold here. He was pointing out (regarding some ESP experiments) how easy it was for occasional trials to go unrecorded and how that in itself was enough to skew the results to seem extraordinary. Since trials here are administered and recorded by computer, it seems to avoid that bias.

I'm going to take the Penn and Teller view here: How do most magic tricks work? Exactly how you suspect they do. I'd bet big money that the program is rigged like a Vegas slot machine. Ignore all his claims of worrying about whether the random number generator is adequately random. That's a bunch of smoke. Audit the computer code. I suspect the output is a function of the input.

"If we gave Bem the benefit of doubt and say he has found that one rare experiment that does show ESP; then others must be able to reproduce that almost 100%. I'd wait and watch."

And if others can't reproduce it the most likely explanation is that the Bem experiment was rigged. When you find a turtle on a fence somebody probably put it there.

And if I am right, then yes, that really does call into question lots of research finding small but statistically significant effects based on "random" selection.

There are reputable particle physics experiments that now suggest that "bafflingly, that measurements performed in the future can influence results that happened before those measurements were ever made." See this summary article for laymen from the April 2010 issue of Discover magazine: http://discovermagazine.com/2010/apr/01-back-from...

In terms of quantum mechanics, that all makes sense if you take seriously the fact that the math has equations with both plus signs and minus signs (as from taking a square root of x^2, in which case both -x and +x are answers) with relation to time. Few have up to now wanted to take the minus signs seriously--as in moving backward in time.

I would be shocked, however, if this sort of "back from the future" behavior that appears to hold true for particles could somehow aggregate up to the level of a conscious human brain. Thus, this psychology result looks very dubious to me as well.

I'm with Adam - it's kind of amusing how people think they're so much smarter than a Cornell Prof. and four reviewers before they've even looked at the paper...

Ricardo - he goes to great length to discuss his Random numbers and provides a number of interesting checks against bias - including using different hi-quality random numbers as well as the fact that people seem to only be better at predicting where erotic images show up.

Gorobel - possible that it's the wordlist, but what makes this so unusually compelling is that he uses research protocols from widely accepted and replicated studies - so his materials are relatively uncontentious. (your cat example doesn't work btw. - even if there are too many cats, why should people be better at recalling the words they would later have to type - if they just remembered cat really well, that would apply equally to all words, typed or not).

Since all experiments are registered at Cornell, I think the file drawer problem is small - it'd be easy to track if he had just done 4 times as many experiments as he claims. Which leaves us pretty much with outright fraud - which doesn't seem terribly likely, either.

I think chances are this will not replicate, but
I certainly hope it will hold up - my worldview certainly wouldn't be shaken because current science can't explain it and I think scientifically proven intuition would be awesome.

Sebastian - reread my comment re recalling the word "cat" and how to get bias. You overweight it on the typed list, and bias appears just because "cat" is easier to remember than "liger."

I could get 53% on a test that predicts what people write in answer to "name a major world city" by just putting a tourism poster for either Paris or Istanbul on a wall the test subjects walk by. If I want to cheat, I vary the poster, and no peer or data review is going to catch that.

just to reinforce what Thomas says, here the description from the paper:
"Participants were first
shown a set of words and given a free recall test of those words. They were then given a set of
practice exercises on a randomly selected subset of those words."

note the "randomly selected" - so the cat poster won't do, because you don't know which words they are going to type - if it's the ones with cat or without (unless, as Thomas rightly says, you cheat on the randomization).

gorobei - here's what they do: There is one list that everyone gets and is tested on. Undoubtedly, some words on that list may be easier to remember. Also, you can probably prime people to remember some words on that list better (e.g. the cat poster you mention). But that's not what the experiment tests for:
The second step is to randomly select some of the words from the list for people to write down. These are random and different for everyone.

Now, explain to me how a cat poster would make people more likely to remember the words they were going to write down - which were, remember, different for every subject! It just doesn't make any sense. If half of the words on the list had cat in it and the whole room was full of stuffed cats that would still apply to all subjects and not just the ones that would later write down words containing cat.

Also this:
"randomly selected" does not allow one to assert the actual sample generated is unbiased. You need to look at the actual sample, not the generating process.

is wrong. Bias in statistics refers to non-random error in a sample/measurement. So if you know the generating process is non-biased you know the sample is not biased. But maybe you mean "actual population" and not actual sample? It would help if you could explain step by step how you think the experiment could have been tampered with other than faulty randomization.

OK - I still don't really understand much except the example - and there you're wrong.
Let's assume in your sample for simplicity that one guy wins 60million, all the others win nothing.
Then the standard deviation for the average is sqrt{(6*10^7)^2 + 999,999*(-60)^2)/1,000,000} - I did the calculation quickly, but I think that comes out to about 60,000 -
Which means that the 95% confidence interval for the mean winnings in the lottery is roughly (taking 2SDs instead of 1.96) {-119940,200060} which definitely includes the correct average win.
Obviously you can have fluke samples - that's why you have confidence intervals and don't just give point estimates.

So unfortunately your example doesn't help me understand your point.
Your last remark about M-of-N algorithms sounds to me, though, as if we have just been talking past each other - for me, the assignment of the treatment is part of what I call randomization (and the experimental literature I know calls it randomization, too). If that's not done correctly, randomization is flawed and sure, the outcome will be biased. But in order to combine that with some type of subtle priming like your cat poster in order to bias the result in a specific direction, the researcher needs to do it on purpose (and that would be visible both in the code and the samples). So what you're saying is that Bem could have falsified his experiment - which is certainly true, but also nothing that anyone here has ever doubted as a possible explanation.

"Your last remark about M-of-N algorithms sounds to me, though, as if we have just been talking past each other - for me, the assignment of the treatment is part of what I call randomization (and the experimental literature I know calls it randomization, too). If that's not done correctly, randomization is flawed and sure, the outcome will be biased. But in order to combine that with some type of subtle priming like your cat poster in order to bias the result in a specific direction, the researcher needs to do it on purpose (and that would be visible both in the code and the samples)."

I have the idea he is saying something else. For example, there are various stupid RNGs for picking one out of N items that result in item zero showing up only half as often as the others. And there are lots of ways to make M of N algorithms wrongly. If the same method is used to choose, say, the order of items to present that is used to choose which items to put on the typed list, it could result in a correlation that would vary with the number of items presented. You could easily get a 3% bias that way with some number of items.

In my example where item zero is selected less often, the first item on the machine's list would be less likely to be presented first, and also less likely to be one of the items to be typed.

If something like this turns out to be the explanation, then this paper could be tremendously valuable -- if it persuades researchers everywhere to be careful to use correct algorithms. If the number of people increased by 5% who bother to create fake test populations with defined statistical qualities, and then sample from them repeatedly to see how often they get the "correct" results, it would be a valuable improvement.

Sebastian,

I reread the parts of the paper on the erotic picture test. They do not give me confidence...

"For this purpose, 40 of the sessions comprised 12 trials using erotic pictures,
12 trials using negative pictures, and 12 trials using neutral pictures. The sequencing of the
pictures and their left/right positions were randomly determined by the programming language’s
internal random function."

Later in the paper we find out the programming language was REALBasic. So, he's probably using a linear-congruential RNG. These are fast but have horrible statistical properties. I don't have the c and m numbers for REALBasic's generator, but classic C rand()%24 is predictable at well over 3% above random given only the previous number.

"The remaining 60 sessions comprised 18 trials using erotic pictures
and 18 trials using nonerotic positive pictures with both high and low arousal ratings. These
included eight pictures featuring couples in romantic but nonerotic situations (e.g., a romantic
kiss, a bride and groom at their wedding). The sequencing of the pictures on these trials was
randomly determined by a randomizing algorithm devised by Marsaglia (1997), and their
left/right target positions were determined by an Araneus Alea I hardware-based random number
generator. (The rationale for using different randomizing procedures is discussed in detail
below.)"

That sounds better, but why were the first 40 data points not just thrown out completely?

"Professor of Psychology at Duke University, suggested that I run a virtual control experiment
using random inputs in place of human participants (personal communication, October 10,
2009). In particular, if the human participant is replaced by the same PRNG or RNG that selects
the left/right target positions, this maximizes the possibility that any non-random patterns in the
sequence of left/right target positions will be mirrored by similar patterns in the left/right
responses of the virtual participant (the RNG itself), thereby producing an artifactual psi-like
result. A null result implies that no such patterns were present."

This maximizes nothing, and a null result implies nothing.

If the RNG is bad for a portion of the trials, and people pay attention and want to actively get the naked people when clicking for porn (I'll take that as true for some*,) I see nothing promising here.

* which rather explains the difference between the extrovert and introvert participants.

Does he program the experiments himself? If not, then what is more probable.

A) All our models of causality are wrong
B) The programmer has some reason to screw with him
C) He is making up data
D) There is a hidden bias in his experiments that he is not aware of

Comments for this post are closed