Collaborating With People Like Me: Ethnic co-authorship within the US

That is the new NBER paper by Richard B. Freeman and Wei Huang and it is an object lesson in the benefits of cross-cultural collaboration:

This study examines the ethnic identify of the authors of over 1.5 million scientific papers written solely in the US from 1985 to 2008. In this period the proportion of US-based authors with English and European names fell while the proportion of US-based authors with names from China and other developing countries increased. The evidence shows that persons of similar ethnicity co- author together more frequently than can be explained by chance given their proportions in the population of authors. This homophily in research collaborations is associated with weaker scientific contributions. Researchers with weaker past publication records are more likely to write with members of ethnicity than other researchers. Papers with greater homophily tend to be published in lower impact journals and to receive fewer citations than others, even holding fixed the previous publishing performance of the authors. Going beyond ethnic homophily, we find that papers with more authors in more locations and with longer lists of references tend to be published in relatively high impact journals and to receive more citations than other papers. These findings and those on homophily suggest that diversity in inputs into papers leads to greater contributions to science, as measured by impact factors and citations.

I can think of at least two ways of interpreting these results.  First, there are research profit opportunities from finding talented foreign collaborators, who perhaps are still undervalued in their home environments, relative to their total potential global productivity.  Second, the globalization of your connections proxies for how elite you are, even after adjusting for other measures of researcher quality.

This shows the benefits of cross-cultural exchange, be it ideas or genes. Note the authors are of different cultures too. It's well known in mixed-race couples that their offspring excel, taking the best from both parents. Just one example of numerous many: "Bruno Mars was born Peter Gene Hernandez on October 8, 1985, in Honolulu, Hawaii, and was raised in the Waikiki neighborhood of Honolulu. He is the son of Peter Hernandez and Bernadette "Bernie" San Pedro Bayot (died June 1, 2013).[11][12] His father is of half Puerto Rican and half Eastern European Jewish (from Hungary and Ukraine) descent, and is originally from Brooklyn, New York.[13][14] Mars' mother immigrated to Hawaii from the Philippines as a child, and was of Filipino, and some Spanish, descent.[13][15][16]"

The plural of anecdote is still not data.

Language evolves dude. as does punctuation. Said the person who speaks three languages fluently, and knows a bit of Tagalog to boot.
The Wall Street Journal has just published this blog post, in which it finally decides to move away from data "are", saying:
"Most style guides and dictionaries have come to accept the use of the noun data with either singular or plural verbs, and we hereby join the majority. As usage has evolved from the word's origin as the Latin plural of datum, singular verbs now are often used to refer to collections of information: Little data is available to support the conclusions. Otherwise, generally continue to use the plural: Data are still being collected."

Oh, sorry, now I see you point: you dispute that this data point is representative. OK but I've yet to see a counterexample, except one time I saw a Japanese woman mate with an English man and their son looked hideous. For all I know he was OK. Usually though girls look great in Asian-Caucasian matings while boys look a little effeminate. And we all know the Greek adage, "strong body, strong mind", which is statistically true (taller, better looking people becomes CEOs for example).

Ray: We don't care if you mate with a Filipino village girl.

It is rare to find an important science collaboration at a US research university that isn't international. The homophily will usually come from small non-research universities (where they would tend to be more Euro-American) and from foreign professors who have connections to schools overseas and like taking grad students from those schools (its not uncommon to have a Chinese pipeline like this). I think Dr. Cowen's assessment is basically correct in the end paragraph.

Totally disagree with your first sentence. I'm sure there are lots of very nice international collaborations but there are tons of very nice to excellent collaborations based entirely in the US. In fact because of the difficulties of including international team members on team-based grant applications to (US) domestic funding agencies, I would think that international collaborations are much less common than they could be.

By international I meant that the ethnicities may be international even if the affiliation is domestic. I.e. Foreign grad students and post docs very common in US universities.

Ah, well then we agree. Thanks for the clarification.

What's their measure of "ethnic homophily"? All authors of same ethnicity or at least two authors of same ethnicity.

Isn't there a probability argument here? High impact papers tend to have larger teams & hence less likely to be ethnically homogeneous?

You'd find two-person collaborations ethnically homogeneous lot more often than 20-person teams.

I algree with albert. This is particularly true in biomedical research.

Didn't control for the changing ethnicity of graduate school departments.

In the graduate school in which I teach on occasion, more than half the graduate candidates in marketing are from foreign countries. If you don't control for the composition of potential co-authors from the same department over time, I don't know how this paper makes sense.

These findings and those on homophily suggest that diversity in inputs into papers leads to greater contributions to science, as measured by impact factors and citations.

Does this really follow? By what mechanism does diversity in nationality lead to stronger papers, especially in science?Maybe "ethnic hemophiliacs" (Is that a word?) tend to produce weaker papers for other, social, reasons.

Perhaps weaker projects are simply more likely to be carried to completion when the authors are friends, somewhat more likely when they share an ethnic background. Or maybe inexperienced researchers, especially foreigners, find it easier to collaborate with fellow nationals. Or perhaps senior academics are more willing to work with a younger fellow national on something that might not be the greatest idea ever, but will help the junior partner's career a bit.

Is it plausible that if Smith, Jones, Wang, and Wei are equally talented researchers that the Smith-Jones and Wang-Wei papers will likely be weaker than Smith-Wang papers, etc.?

IF you assume that diversity leads to potentially more conflict (most commonly through difficulties in communication) then it is simply harder to start and complete work with very different coauthors. Marginal pieces are likely to be shunted aside. Work that is rejected early will not be pushed as hard by its authors given difficulties in communication and determination of who should do what. This is a different variant on Cowen's point 2, but it is also a kind of selection bias.

An interpretation here would be that people disprefer working with non-co-ethnics, so don't do so unless they've got something really strong in the pipeline (thus why "heterophilic" papers tend to "outscore").

E.g. if a Togolese is working with a Kyrgyzstani, there will generally be more of an incentive in terms of the quality of the final output, that it'll do more to help his career and move the science forwards, than he would require to work with another Togolese.

Under that interpretation, enforcing heterophily in papers wouldn't actually increase quality, and instead would just decrease hedonic experience and probably quantity.

"Second, the globalization of your connections proxies for how elite you are,"

Right -- you can see this in baseball where the two richest teams, the New York Yankees and the Boston Red Sox regularly have among the most diverse rosters because they can afford to skim the very best players from the Japanese major leagues. For example, the Yankees recently signed a pitcher who went 24-0 in Japan last year to a $155 million contract.

It's widely assumed from these famous example that more diversity will make your, say, branch office of Dunder Mifflin run better, but that may not actually be true.

In the popular theory of diversity enshrined by the Supreme Court in the 2003 pro-affirmative action decision, the reason the Yankees and Red Sox win pennants would be because the other players must be synergistically learning Japanese insights into baseball in the clubhouse (rather than because the Yankees and Red Sox can afford to hire more world class players than anybody else). Perhaps that's true, but most of the behind-the-scenes accounts I've read suggest that extreme diversity of the top teams lessens clubhouse interactions simply because the Japanese stars can only communicate through their translators.

But clubhouse synergy is mythologized in baseball, while sheer individual skill is, if anything, still underrated. Barry Bonds, for example, never ever gave a tip to a teammate, but he was still pretty valuable as his seven MVP awards attest.

Scientific research is probably more synergistic than baseball, but this study mostly seems to measure how "world class" the research teams are.

Steve, and there are no positive spillovers from these global 'dream teams'? Please. Research progresses by pushing out the frontier and then filling in the details. The more efficiently the best talent are combined, the further and faster we can push that frontier. Everyone benefits: the second-tier researchers who get a good example to follow and more importantly all those people who stand to benefit from the research. I have seen it first hand: the cost of the jobs lost to global talent pales in comparison to the benefit of what they add to the intellectual environment. And if we are all more productive with diverse "world class" teams, we all win ... more wages, more growth, etc. The economy is not zero sum.


But my point is simply that most of us are not members of world class teams like the Yankees or CERN or whatever, so TED talks about how to apply the lessons from world class teams to running your Dunder Mifflin paper products branch office aren't as applicable as everybody assumes.

I disagree. Dunder Mifflin benefits from putting the best team together as possible too ... and Dwight was in a class all of his own. Now your point is well taken that the superstars are different but there are still lessons for all of us. (See #2 of today's Assorted Links.)

We might get better aggregate results from pooling a bunch of teams where there's one superstar and a bunch of second rankers than dream teams full of competing egos and all the second rankers in their own little teams.

I guess the null hypothesis would be of the current allocation being optimal, but we don't know unless it's tested. There's plenty of room for irrationality where multiple talents pile on the same prestigious projects in a redundant and competitive manner.

How do you control for immigration on this?

Are "European" names including hebew?

Are European names indicative of immigration status?

An article and discussion about ethnic co-authorship and nary a word about the fact that Jewish scholars disproportionately co-author and cite papers with fellow Jewish academics?

we find that papers with more authors in more locations and with longer lists of references tend to be published in relatively high impact journals and to receive more citations than other papers.

Which locations? If South America and Africa aren't proportionally represented, I'm calling Eric Holder.

I don't know about other fields, but major papers in genomics are increasingly being authored by a cast of thousands. For example, here's an article I linked to today:

"Ancient human genomes suggest three ancestral populations for present-day Europeans" by Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Peter H. Sudmant, Joshua G. Schraiber, Sergi Castellano, Karola Kirsanow, Christos Economou, Ruth Bollongino, Qiaomei Fu, Kirsten Bos, Susanne Nordenfelt, Cesare de Filippo, Kay Prüfer, Susanna Sawyer, Cosimo Posth, Wolfgang Haak, Fredrik Hallgren, Elin Fornander, George Ayodo, Hamza A. Babiker, Elena Balanovska, Oleg Balanovsky, Haim Ben-Ami, Judit Bene, Fouad Berrada, Francesca Brisighelli, George B.J. Busby, Francesco Cali, Mikhail Churnosov, David E.C. Cole, Larissa Damba, Dominique Delsate, George van Driem, Stanislav Dryomov, Sardana A. Fedorova, Michael Francken, Irene Gallego Romero, Marina Gubina, Jean-Michel Guinet, Michael Hammer, Brenna Henn, Tor Helvig, Ugur Hodoglugil, Aashish R. Jha, Rick Kittles, Elza Khusnutdinova, Toomas Kivisild, Vaidutis Kučinskas, Rita Khusainova, Alena Kushniarevich, Leila Laredj, Sergey Litvinov, Robert W. Mahley, Béla Melegh, Ene Metspalu, Joanna Mountain, Thomas Nyambo, Ludmila Osipova, Jüri Parik, Fedor Platonov, Olga L. Posukh, Valentino Romano, Igor Rudan, Ruslan Ruizbakiev, Hovhannes Sahakyan, Antonio Salas, Elena B. Starikovskaya, Ayele Tarekegn, Draga Toncheva, Shahlo Turdikulova, Ingrida Uktveryte, Olga Utevska, Mikhail Voevoda, Joachim Wahl, Pierre Zalloua, Levon Yepiskoposyan, Tatijana Zemunik, Alan Cooper, Cristian Capelli, Mark G. Thomas, Sarah A. Tishkoff, Lalji Singh, Kumarasamy Thangaraj, Richard Villems, David Comas, Rem Sukernik, Mait Metspalu, Matthias Meyer, Evan E. Eichler, Joachim Burger, Montgomery Slatkin, Svante Pääbo, Janet Kelso, David Reich, Johannes Krause

Yes but that's hardly diversity for diversity's sake. These are merely large projects with division of labor. Genome sampling papers in Nature have similar author lists.

Looking at the name of the article, it is perfectly rational to have 2-3 contributors from each European country.

The Atlas collaboration at CERN is publishing papers where the 6-page author list is longer than the scientific content of the paper.

Definitely you see exactly this in computer science and engineering. But I don't think it has so much to do with diversity as a 'good thing'. I think it has more to do with higher ranking schools having much higher expectations for English competency. Groups from top universities tend to publish more cited papers in higher impact journals.

Professors at very top universities tend to lead diverse groups (at least relative to the overall diversity of CSE). A professor of Indian origin will supervise students from China, US, Europe, for example. The top schools tend to have quite high English language expectations for both faculty and students, so that language barriers to inter-ethnic collaboration are fairly low. And it leads to stronger English skills for all, which also correlates with grant/publication quality and impact. And if the observed levels of ethnic correlation in groups is low (even if not zero), then the expectation will be that all groups are open to all and collaboration nets can be wide, which is self-reinforcing.

Especially at 2d tier institutions, you see that many departments are very ethnically balkanized. Professors who graduated from IIT have research groups that are made up only of students who also graduated from IIT., professors from China will only supervise PhD students and postdocs from China, etc, Often, expectations of English competency are lower at 2d tier schools, which increases cost of inter-ethnic cooperation for both faculty and students -- so much easier to collaborate in your native language!. And of course, once a research group becomes mono-ethnic/non-English based, this is highly self reinforcing. People working in these groups may not be speaking much English in their workday, which can also affect their English fluency - which in turn affects grant/publication quality and impact. Of course, this is not true of all groups, many professors welcome students of all backgrounds, but totally enough to show up in statistics.

This is possible because even though general graduate admissions are managed by department and university administration, professors choose which students they supervise and provide research support.. My own opinion is that departments should keep much closer reins on this, but it is hard because it does touch on faculty's academic freedom to choose the students they provide research support to.

No, no, no; what counts is exactly how many really exotic co-authors you've had. Any fool will have a list of Indians and Chinese as long as your arm; mine include chaps from Nepal and PNG.

