In a recent research paper, Weber and Castillo report:
How does the web search behavior of "rich'' and "poor'' people differ? Do men and women tend to click on different results for the same query? What are some queries almost exclusively issued by African Americans? These are some of the questions we address in this study. Our research combines three data sources: the query log of a major US-based web search engine, profile information provided by 28 million of its users (birth year, gender and zip code), and US-census information including detailed demographic information aggregated at the level of ZIP code. Through this combination we can annotate each query with, e.g., the average per-capita income in the ZIP code it originated from….
Here are a few details:
What kind of web results would you personally want to see for the query "wagner"? Well, if you are a typical female US web user you probably have pages about the composer Richard Wagner in mind. However, if you are a male US web user you are more likely to be referring to a company called Wagner which produces paint sprayers. Similarly, the term most likely to complete the beginning "hal" is in general "lindsey," [an evangelist and Christian writer] whereas for people living in areas with an above average education level the most likely completion is "higdon." [an American writer and runner]
And what are the "most discriminating" search queries for various demographic groups? Suddenly I felt awkward reading this piece. Do note that the method of construction means the list will be dominated by demographically skewed neighborhoods, which need not be representative of the group as a whole.
Below the poverty line:
slaker [seems to be an informal misspelling of "slacker"]
kipasa [Spanish-language animation site]
I had never heard of any of those.
If you have a BA, the most discriminating search query is
"spencer stuart executive search," followed by some other boring-sounding choices, such as "four seasons jackson hole." To continue with some groups:
Whites:
pulloff.com [concerns tractors and motorsport]
central boiler wood furnace
firewood processors
midwest super cub
African-Americans:
trey songs bio [should be "songz"]
def jam records address
s2s magazine
madinaonline [sells body oils]
There is more information on p.5 of the paper. Can you guess which group is well-predicted by the search query "jingos para baby shower"?
For the pointer I thank David Curran.















One worry with any personalization technology is that it can become self reinforcing. Say search engines split up what renting locations are returned based on race. That means you could end up with an algorithmic apartheid.
There are trade offs between returning you more relevant information and grouping you so tightly you do not learn about other options. If algorithms help decide what information you have access to a more general discussion on quality versus diversity of information might be useful.
Computers are going to have to start understanding context if they are going to start understanding human communication. The way we talk is underspecified if you only consider the content of the words.
I don’t buy the idea that search engines will make us more different via autofill. Even if my search engine knows I’m a man and I type in “Wagner” looking for the composer, I’m not going to be distracted when the painter company pops up, forget about the composer and suffer an even larger gap in the relationship with my wife. Nor will I be put off in my search for the Def Jams address just because Google has probably figured out that I’m white.
It’s just the cultural variety made possible by the Internet that’s quickly destroying any possible assertion of American mass culture beyond the Superbowl and a few other huge things. I mean, I had never heard of the below-the-poverty-line query items, but when an information omnivore like Tyler admitted complete ignorance, that’s a scary demonstration of how hard it will be for us to know one another.
Of course, this started quite a while back with TV. Back when we only had the networks and a few UHF stations, everyone did watch largely the same things because they had no choice. As channels grew, that changed. Now if you look at viewing habits divided by race or age, they’re astonishingly different. (They’re different by gender but not as different because a lot of co-habitating couples watch TV together.)
The other obvious force that creates cultural fragmentation is mass immigration, which never seems to correlate with periods of mass media so we’ll never be able to see how much faster mass media would make the assimilation process. We cut immigration off in the 20s, just as the radio was becoming popular and though we re-opened the gates in the 60s, when network TV was still king, the big numbers didn’t come till the 90s, when cable was everywhere and the Internet began to rise.
I wonder how the results would differ if the exact same study was done using Google’s data. I suspect that more savvy web users patronize Google.
Facebook is sitting on a goldmine of demographic information about you. A great deal can be inferred from your social graph and your list of “likes” — wasn’t there a paper not long ago that purported to demonstrate that a computer program could guess with reasonable accuracy whether a Facebook user was gay, merely based on their set of friends?
This suggests that Google has a problem in its future. After all, they displaced earlier incumbents like AltaVista by offering a clearly better alternative, with PageRank delivering the most relevant links in the top 10 results. It would seem that Facebook has a clear window of opportunity to build a better search engine than Google.
Unfortunately for Google, they simply don’t get social networks (see the recent “pandas and lobsters” blog post for a good metaphor as to why). And it’s unlikely that they can solve the problem with an acquisition (say, Myspace) because they have an awful track record of letting acquired websites wither and die. A cultural shift is probably urgently needed, lest they turn into a latter-day Microsoft.
I pity the man who finds the Four Seasons Jackson Hole boring.
I think the consequences of a really nuanced application of this demographic-based search personalization are scary, for political/news research. Clearly there is already a great deal of selective consumption of news and political commentary, but that doesn’t mean it couldn’t get worse. Imagine someone typing in “Obama born in America,” and then having Google filter out the numerous sites with proof of that fact to only show page after page of the conjecture that he wasn’t, if they knew based on individual- (or, more frighteningly, zip-) level data that you were inclined toward conservatism. It’s one thing to expect all the links to be conservative or liberal from your favorite conservative or liberal blog, but another entirely if the primary information even shown on a search engine was already filtered to fit your pre-conceived ideas, or what others in your “group” want your ideas to be.
I’ve listed to, and enjoyed, Wagner’s music. I have no idea what a “paint sprayer” is. Does this mean I’m not a guy?
ed, yes.
Google’s Autofill Choice feature for searches is interesting, in part because sometimes Google demonstrates political animus there. For example, back in January, Google refused in response to
“Pat Bu”
to offer as an autofill choice
“Pat Buchanan”
Instead, they offered
Pat Burrell
Pat bus schedule
Pat Buttram
Pat Burrell stats
Pat Burns
Pat Burrell wife
Pat Burke
Pat Buckley Moss
Pat Buckley
Pat Burns cancer
Meanwhile, Bing and Yahoo offered “Pat Buchanan” as the first autofill choice. Eventually, Google got embarrassed at demonstrating blatant political bias and now Pat Buchanan is second on the autofill list:
http://isteve.blogspot.com/2010/01/google-no-like.html
meter – the lack of por nographic search terms just as limkely means that tastes in porn aren’t that different between identifiable demographics.
Stuart Spencer? i tough it was the movie with Dr House
Comments on this entry are closed.