…an analysis of strike outs (failing to hit the ball three times in a
row) in American baseball from 1913 to 2006 showed that players whose
first or last names began with K suffered significantly more strikeouts
than other players. Why? Because in baseball scoring, K is used to
denote a strikeout – "For players with this initial, the explicitly
negative performance outcome may feel implicitly less aversive," the
researchers said.Next, an analysis of 15 years of MBA students’
grades at a large American University showed that students with the
initials C or D achieved significantly lower grades than students whose
initials were unrelated to grade scores, and students with the initials
A or B.Was this due to the students’ self-preference for their
initials or was it the examiners showing the bias? To test this, Nelson
and Simmons, asked hundreds of other undergrads to report their liking
for the different letters of the alphabet. A subsequent analysis of
their exam scores again showed that students with the initials C or D
performed less well, but only if they had previously shown a preference
for these letters. This shows that affection for one’s own initials
really is playing a role in the patterns being observed here.Another
study showed how far-reaching these effects can be. An analysis of
392,458 lawyers who studied at 170 law schools showed that as the
quality of law schools declined, so too did the proportion of lawyers
with the initials A or B who had attended.
Here is more. When I think of the letter K, I think of Ted Kluszewski, Harmon Killebrew, and brawny Poles who swing for the fences. Maybe that’s lame, but I don’t see that names with "S" strike out more often, or that names with "H" hit more home runs. Maybe "K" has special power, just ask Franz Kafka. The A and B stuff puzzles me too, but it also doesn’t seem consistent with other parameters on the power of suggestion. I also would expect the C and D names to do better than average, given all the names lower in the alphabet, at least if there is going to be an effect at all. Are people whose names start with the letter "B" more likely to be bloggers?
I’m not contesting the raw tabulations but my gut feeling is that the letters in your name correlate with physique or education or IQ in some other way. One paper is here, I don’t see a lot of controls.
Addendum: Andrew Belman seems puzzled too. Alex has a related post, he doesn’t feel totally puzzled.















Hm. There’s an awful lot of “Dave”s in the world, and and awful lot of girls’ names beginning with A. Might the CD/AB thing be a correlation with gender?
(Mind you the first Dave I met was my MIT-grad, PhD-having dad, so I guess he did OK in school…)
The K thing, I have to just wonder if that’s a sample size thing, since I would expect K to be a pretty rare initial. I’d be interested to see if this is different in 20 or 30 years given what a huge fad there’s been for spelling babies’ names with K regardless of their traditional spelling, thus presumably increasing the sample size a lot (though possibly skewing it in other ways, as that fad does not affect all groups equally; I, for instance, know how to spell).
People with E names don’t make more errors.
See here.
I am having a hard time seeing why this is a puzzle. Just look at 100 things and you will find that 5 are significant by chance. These pattern recognition studies are merely reporting those significant patterns observed.
Just wait, from the next data set scholars will report a different 5 relationships that are “discovered.”
To me, this amounts to a great example of why theory needs to guide empirical work. This is ridiculous research when one finds observations by chance, describes it as a finding, and does post hoc theory.
“I am having a hard time seeing why this is a puzzle. Just look at 100 things and you will find that 5 are significant by chance. These pattern recognition studies are merely reporting those significant patterns observed.
Just wait, from the next data set scholars will report a different 5 relationships that are “discovered.”
To me, this amounts to a great example of why theory needs to guide empirical work. This is ridiculous research when one finds observations by chance, describes it as a finding, and does post hoc theory. ”
Well said! Thank you Chicago economist!
If the letter k in your name made you strike out a lot, then how would you make it to the majors in the first place? This is just data mining.
It seems that the commenter who noted gender bias in the A’s/B’s study is on the right track. Perhaps it’s the worst dataset out there, but a quick look at my cell’s phone book shows a majority of the A/B’s are female and a vast majority of the C/D’s male.
We could delve into it even further and check differences in distributions of initials between different income groups. Like the “K” correlation, I bet this one will also be explained away when properly controlled.
Most of these criticisms are answered in the paper.
- This empirical research was guided by theory, or at least by previous research, on what is known as the name-letter effect; see here for a brief review.
- The authors did control for era. In their words: “When we controlled for average year of play (and excluded initials associated with fewer than 5 Major League players—e.g., U as a first initial), K was both the first initial and the last initial associated with the highest strikeout rate.”
- The Hardball Times (which found a higher strikeout rate for J’s than K’s) did not control for era. (Their study also differed in whether they gave each player the same weight or each plate appearance the same weight. Think about which is a fairer test, then follow Phil’s link to see which did which.)
- The authors did make an attempt to control for ethnicity, although they only did so for foreign-born players. In their words: “The effect was also reliable when we controlled for country of origin with a dummy variable for each of the 52 countries represented in the sample.”
- The blog study finding that E named players do not have a higher error rate only looked at one year of MLB data. (And there could be a real K-strikeout effect even if there really is no E-error effect, since the process for avoiding strikeouts might differ from the process for avoiding errors in some relevant way.)
- In the A/B vs. C/D academic performance study, the authors controlled for gender, along with ethnicity, U.S. citizenship, and graduation year.
- People are right to be concerned about data mining. The question is how many other letter correspondences they (and other researchers) looked at before they found the K-strikeout relationship. That is less of a concern with the ABCD study, since they have three follow-up studies that provide converging evidence. One is an experimental study in which participants were randomly assigned to have either the good prize or the bad prize (or neither) have a label that matched their first initial.
When did so-called economists stop caring about things that are actually important, and start doing these quirky little papers that might get written up in the newspaper but ultimately make no difference to the world?
To the authors of the cited study: Yes, yes, you proved you can play the game. Maybe you’ll do well on the job market and get some things published. Maybe you’ll even get a fat advance for a freakonomics-style book.
The rest of the world will still despise you.
It seems like something is going on here. I know that people from different classes favor different child names. However, schools only assign grades A-D, not A-Z. So the favoring of A’s and B’s could just be a coincidence.
How correlated are first initials with race? The distribution of outcomes conditional on first initial could be capturing the correlation between race and outcomes. (Though my guess is that these are just spurious relations.)
Mike, the problem with using only one year of data is that if there’s an E-error effect it’s probably too small to show up in a dataset of that size. You need a lot of data points to find a small effect, and there just aren’t that many E players. In technical terms, given the expected effect size (which we can estimate based on the K-strikeout study) there’s a high probability of a type II error with that sample size.
I don’t think that the argument in your 2nd paragraph goes through. A slightly higher strikeout rate would keep some borderline Major Leaguers out of the big leagues, which would make the effect size among Major Leaguers somewhat smaller than the actual effect of a K initial, but it’s not going to erase the relationship. If decisions about who made the Majors were based solely on strikeout rate then the effect might get wiped out, but since strikeout rate is only one of many factors that affect performance (and thus roster decisions), the higher strikeout rate will still hold up.
players whose first or last names began with K suffered significantly more strikeouts than other players.
As Chicago Economist suggests, it is highly likely, ex ante, that there is some letter such that players whose names start with it strike out significantly more than other players. So the magic letter turns out to be K. Interesting but meaningless data mining.
The strikeout is not a measure of success. K is only one of many possible outcomes. Focusing on a single category of outcome rather than the a player’s aggregate outcome is not sensible.
I also dislike that the significance is reported but not the confidence interval. It is quite possible to have a significant result with a large confidence interval.
The ABCD data at least show error bars. The graph is misleading, but the error bars indicate less than a 2 standard deviation effect.
I’m surprised Tyler didn’t think of Theodore Kaczynski.
Comments on this entry are closed.