You are fairly predictable, perhaps

The new article is “Private traits and attributes are predictable from digital records of human behavior,” by Michal Kosinski, David Stillwell, and Thore Graepel.  Here is the abstract:

We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.

For the pointer I thank Brandon Robison.


What about gnomosexual anarcho-syndicalists?

100%. You're the only one, right?

A dead giveaway if you posted the bi-weekly meeting agenda.

.......homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases.......

Is the correct conclusion that a computer found the most identifiable characteristic race? I don't use Facebook, but I know that one could like Obama or Rommey in the last election. It is also possible to like political parties. How could it be harder to identify D vs. R? Or is the 15% a reflection of DINOs and RINOs?

I think it may be possible that there is a variance between someone's self identification and their functional identity as revealed in their behavior.

Since race is less malleable there is greater accuracy.

Perhaps this study has more to do with gaps in our stated vs. revealed preferences than it does on Facebook’s ability to categorize us by likes.


If I had to guess, the 15% may well be a mix of libertarians and other minority political movements, though I'm sure there's some truth to the DINO/RINO hypothesis. Someone who "liked" legalize marijuana and pro gay marriage pages as well as a number of Republicans running for office is probably really hard to peg if you're using a binary metric.

I considered that, however, Christianity vs. Islam was predicted at 82%. I doubt only 18% identify as "other", so I presumed that they had discounted "other" from their calculations.

Eg, I am assuming that if the computer guessed D and the correct answer was not R, the result wasn't included in the data set.

I couldn't find a definitive answer at the link.

For every person, the model gives a "Republican" score. They don't predict if you're a Republican or not, they just give you a score. Then you randomly select one Democrat and one Republican from the data. 85% of the time, the model would give the Republican a higher "Republican" score.

I wonder what you can infer if you never Liked anything, or Liked everything.

Easy. For the first case that you are a 67 year old divorced male, for the second, that you're a 14 year old girl.

"The model correctly discriminates between homosexual and heterosexual men in 88% of cases"

Just guess heterosexual and you do way better than 88%...

Nope. That was my first thought too!

They are not that stupid. :) By their metric your "guess hetero" strategy would only be 50% right.

"Fig. 2 shows the prediction accuracy of dichotomous variables expressed in terms of the area under the receiver-operating characteristic curve (AUC), which is equivalent to the probability of correctly classifying two randomly selected users one from each class (e.g., male and female)."

"expressed in terms of the area under the receiver-operating characteristic curve (AUC)"

+1! About time somebody figured out how to do this correctly.

“The model correctly discriminates between homosexual and heterosexual men in 88% of cases”

It is a mistake to take self identification as the definitive "right" answer in any of these categories.

Ninety percent in anything may approach the limits of measurement given the number of people out there who are in the closet or sexually ambiguous or conflicted. Generally some percentage of the population has no particular stability or consistency in their views about most things.

In other words, stereotypes about who "likes" what on Facebook are reasonably accurate, with racial stereotypes the most accurate:

No, by mapping likes they are mapping sub-cultural boundaries. That the black-white cultural boundary is more apparent than various white-white divides is not that surprising, and does not actually depend on anyone's stereotypes.

You have it backwards unless you really didn't mean "depend on anyone's stereotypes".

Not sure what you mean. I'm thinking that if I'm an Orange County resident and fly fisher, that might be strong evidence that I am part of a Republican group, but a Massachusetts fly fisher might break differently. Does it matter what anyone's stereotypes for fly fishers were?

(If you think your stereotypes are just accurate representations of current big data in transition, then you are just saying you support big data ... and expressing the hubris that you have it between your ears.)

I would say that the sterotypical fly fisherman is an older, upscale white male. He also likes upscale outdoorsy pastimes like hiking and skiing. Drinks imported beer or microbrews.

The sterotypical bass fisherman, on the other hand, is kind of a redneck. He likes Nascar, country music and Larry the Cable Guy movies. He drinks Bud Light.

I'd only venture that the stereotypical bass fisherman lives near a lake with bass ;-)

No it does not matter which is fundamentally my point. Why write "depend on stereotypes" where the likes only support such stereotypes; the big data support such stereotypes. That is what Steve is trying to say or perhaps, that's my generous interpretation. Stereotypes do not grow out of thin air.

I'm not saying anything about _my_ stereotypes. I try to avoid them entirely, but they exist.

Well played with the bass fisherman and certainly, I have little to no hubris to express here.

Not at all!

The fact that there exists some good algorithmic means to sniff out race does not mean the Sailerisque stereotype would sniff it out that well.

Here are the Top 10 lists from the Cambridge study of likes that most discriminate among demographic and personality groups: pretty darn stereotypical:

e.g., David Bowie -- who do you think like David Bowie more: whites or African-Americans?

Sorry, here's the URL of the Top 10 lists from the Cambridge Study.

Some of that is hilarious: Apparently there's a high likelihood of a person being Muslim if he liked "I’m A Muslim & I’m Proud" or "Hadith Of The Day".

OTOH, he's Christian if he liked "I’m Proud To Be Christian" or "Jesus Christ".

Duh! What a prediction! Sometimes PNAS surprises me. Does this study deserve to be in there at all?!

In fairness, the supplement only says these are the most predictive likes. The authors address the point of whether their model simply relies on trivial indicators with the following: "Moreover, note that few users were associated with Likes explicitly revealing their attributes. For example, less than 5% of users labeled as gay were connected with explicitly gay groups, such as No H8 Campaign, “Being Gay,” “Gay Marriage,” “I love Being Gay,” “We Didn’t Choose To Be Gay We Were Chosen.” Consequently, predictions rely on less informative but more popular Likes, such as “Britney Spears” or “Desperate Housewives” (both moderately indicative of being gay)."

So the vast majority of people who like "Jesus Christ" are Christian. The converse is not true, though: the vast majority of Christians do not "like" obvious indicators like Jesus or the Bible.

I am shocked. Liking Desparate Housewives is only moderately indicative of being Gay?

Sorry, but when I read "highly sensitive personal attributes" I expected something a lot more impressive than skin color. The abstract builds up as if it can determine some deep truth about your soul, but it's just that if you like the Yeah Yeah Yeahs and tweed jackets it recognizes that you're a whitey.

there's no deep truth, all that matters is knowing enough about you to sell you shoes, concert or plane tickets or a new car.

If you like Rush Limbaugh, you're a Republican. If you like Tyler Perry, you're black. If you like RuPaul, you're gay. If you like your husband, you're in a relationship. And remember that the people in this study were volunteers and averaged 170 likes each.

The programmers at fb should take note of this. I remember during the last election many of my liberal friends were angry that fb suggested they like Mitt Romney's page. I guess at the time I was shocked that people would get so offended by an impersonal computer algorithm, but it seems now that there are ways around it.

Do you think there might have been some money changing hands there? ;-)

I remember, early in Amazon history, when the site claimed that men's underwear was the most popular item of the day. That might have been a glitch, or they really, really, wanted to move some underwear.

"I guess at the time I was shocked that people would get so offended by an impersonal computer algorithm"
No, they were offended by the fact that he was in the hated outgroup. They wouldn't have been offended if it suggested a product designed for the opposite sex, but then again, insinuating that they are the wrong sex isn't as bad as calling them *republican.*

Really, if it were not possible to make such distinctions between people in the "big data," advertisers would certainly have tired of it by now!

> "...and discuss implications for online personalization and privacy."

Oh I wonder what those could be.

The best predictors of high intelligence include “Thunderstorms,” “The Colbert Report,” “Science,” and “Curly Fries,” whereas low intelligence was indicated by “Sephora,” “I Love Being A Mom,” “Harley Davidson,” and “Lady Antebellum.” Good predictors of male homosexuality included “No H8 Campaign,” “Mac Cosmetics,” and “Wicked The Musical,” whereas strong predictors of male heterosexuality included “Wu-Tang Clan,” “Shaq,” and “Being Confused After Waking Up From Naps.”

Apparently Republicans like:

George W Bush
John McCain
Rush Limbaugh
Sean Hannity
Bill Oreilly
Positively Republican
Sarah Palin
Ronald Reagan
Glenn Beck

While Democrats like:
Joe Biden
Nancy Pelosi
Health Care Reform
The White House
Barbara Boxer
Anthony Weiner
Being Liberal
Left Action
Barack Obama2012
Ted Kennedy

Who knew?

The Like data is a literal representation of what the person likes. This isn't all that impressive.

and as someone who does not have a facebook account, I suppose that they can create a profile of me, too.

It is interesting that liking Nancy Ajram - a Christian Lebanese female singer - is predictive of being a Muslim. That suggests something more about the demographics of the Muslim users of Facebook than anything else I would think. After all most American Muslims are probably not Arabic speakers and so would have little interest in Ms Ajram's music. In fact I wonder if the Nation of Islam and its offshoots are a majority in the US? But Arabic speakers are probably more middle class than African or African American Muslims and hence more likely to sign up.

Even most black muslims aren't members of NOI.

Comments for this post are closed