# Category: Uncategorized

## Big Data+Small Bias << Small Data+Zero Bias

Among experts it’s well understood that “big data” doesn’t solve problems of bias. But how much should one trust an estimate from a big but possibly biased data set compared to a much smaller random sample? In Statistical paradises and paradoxes in big data, Xiao-Li Meng provides some answers which are shocking, even to experts.

Meng gives the following example. Suppose you want to estimate who will win the 2016 US Presidential election. You ask 2.3 million potential voters whether they are likely to vote for Trump or not. The sample is in all ways demographically representative of the US voting population but potential Trump voters are a tiny bit less likely to answer the question, just .001 less likely to answer (note they don’t lie, they just don’t answer).

You also have a random sample of voters where here random doesn’t simply mean chosen at random (the 2.3 million are also chosen at random) but random in the sense that Trump voters are as likely to answer as are other voters. Your random sample is of size n.

How big does n have to be for you to prefer (in the sense of having a smaller mean squared error) the random sample to the 2.3 million “big data” sample? Stop. Take a guess….

The answer is…here. Which is to say that your 2.3 million “big data” sample is no better than a random sample of that number minus 1!

On the one hand, this illustrates the tremendous value of a random sample but it also shows how difficult it is in the social sciences to produce a truly random sample.

Meng goes on to show that the mathematics of random sampling fool us because it seems to deliver so much from so little. The logic of random sampling implies that you only need a small sample to learn a lot about a big population and if the population is much bigger you only need a slightly larger sample. For example, you only need a slightly larger random sample to learn about the Chinese population than about the US population. When the sample is biased, however, then not only do you need a much larger sample you need it to large relative to the total population. A sample of 2.3 million sounds big but it isn’t big relative to the US population which is what matters in the presence of bias.

A more positive way of thinking about this, at least for economists, is that what is truly valuable about big data is that there are many more opportunities to find random “natural experiments” within the data. If we have a sample of 2.3 million, for example, we can throw out huge amounts of data using an instrumental variable and still have a much better estimate than from a simple OLS regression.

## Gangs really matter

We study the effects that two of the largest gangs in Latin America, MS-13 and 18th Street, have on economic development in El Salvador. We exploit the fact that the emergence of gangs in El Salvador was in part the consequence of an exogenous shift in US immigration policy that led to the deportation of gang leaders from the United States to El Salvador. Using the exogenous variation in the timing of the deportations and the boundaries of the territories controlled by the gangs, we perform a spatial regression discontinuity design and a difference-in-differences analysis to estimate the causal effect that living under the rule of gangs has on development outcomes. Our results show that individuals living under gang control have significantly worse education, wealth, and less income than individuals living only 50 meters away in areas not controlled by gangs. None of these discontinuities existed before the arrival of gangs from the US. The results are not determined by exposure to violence, lower provision of public goods, or selective migration away from gang locations. We argue that our findings are mostly driven by gangs restricting residents’ mobility and labor choices. We find that individuals living under the rule of gangs have less freedom of movement and end up working in smaller firms. The results are relevant for many developing countries where non-state actors control parts of the country.

That is from a new paper by Nikita Melnikov, Carlos Schmidt-Padilla, and Maria Micaela Sviatschi.  Via the excellent Samir Varma.

## Urban growth and its aggregate implications

That is the title of a new paper by Gilles Duranton and Diego Puga.  This piece goes considerably beyond previous research by having a more explicit model of both urban-rural interactions, and also possible congestion costs arising from more YIMBY.  Here are a few results of the paper:

1. If you restricted New York City and Los Angeles to the size of Chicago, 18.9 million people would be displaced and per capita rural income would fall by 3.6%, due to diminishing returns to labor in less heavily populated areas.

2. The average reduction in real income per person, from this thought experiment, would be 3.4%.  You will note that NIMBY policies are in fact running a version of this policy, albeit at different margins and with a different default status quo point.

3. If you were to force America’s 11 largest cities to be no larger than Miami, real income per American would fall by 7.9%.

4. If planning regulations were lifted entirely, NYC would reach about 40 million people, Philadelphia 38 million (that’s a lot of objectionable sports fans!), and Boston just shy of 30 million (ditto).

5. Output per person, under that scenario, would rise in NYC by 5.7% and by 13.3% in Boston.  That said, under this same scenario incumbent New Yorkers would see net real consumption losses of 13%, whereas for Boston the incumbent losses are only about 1.1%.

6. The big winners are the new entrants.  On average, real income would rise by 25.7%.

7. Alternatively, in their model, rather than laissez-faire, if America’s three most productive cities relaxed their planning regulations to the same level as the median U.S. city, real per capita income would rise by about 8.2%.

8. In all of these cases the authors calculate the change in rural per capita income, based on resulting population reallocations.

Recommended, I am very glad to see more serious work in this area.

## The U.S. cultural elite

Overall I do not regard this as good news:

We examine the educational backgrounds of more than 2,900 members of the U.S. cultural elite and compare these backgrounds to a sample of nearly 4,000 business and political leaders. We find that the leading U.S. educational institutions are substantially more important for preparing future members of the cultural elite than they are for preparing future members of the business or political elite. In addition, members of the cultural elite who are recognized for outstanding achievements by peers and experts are much more likely to have obtained degrees from the leading educational institutions than are those who achieve acclaim from popular audiences.

That is from a new paper by Steven Brint, et.al., via the excellent Kevin Lewis.

## The economic importance of the Middle East?

Or lack thereof?:

Even beyond Iran, the region scarcely registers on multinationals’ profit-and-loss statements. The Middle East and Africa accounted for 2.4% of listed American firms’ revenues in 2019, according to Morgan Stanley, a bank. For European and Japanese companies it was 4.9% and 1.8%, respectively. Middle Easterners still buy comparatively few of the world’s cars (2.3m out of 86m sold globally in 2018). Peddlers of luxury goods like Prada, an Italian fashion house, and L’Oréal, a French beauty giant, book 3% of sales in the Middle East (not counting sheikhs’ shopping trips to Milan or Paris).

The overall regional footprint of Western finance appears equally slight. At the end of 2018 big American banks had \$18.5bn-worth of credit and trading activity in the region, equivalent to 0.2% of their assets. This includes JPMorgan Chase’s \$5.3bn business in Saudi Arabia and Citigroup’s \$9.6bn exposure to the United Arab Emirates (UAE). European banks have, if anything, been retreating. BNP Paribas of France sold its Egyptian business seven years ago and earned a footling €121m (\$143m) in the Middle East in 2018. HSBC reports a substantial \$58.5bn in Middle Eastern assets, though that is still a rounding error in the British lender’s \$2.7trn balance-sheet.

Here is more from The Economist, noting that energy does play a larger role in other economic sectors.  If there is any part of the world that could use more multinational activity…

## Places to go in 2020

Here is the mostly dull NYT list.  Here is my personal list of recommendations for you, noting I have not been to all of the below, but I am in contact with many travelers and paw through a good deal of information:

1. Pakistan, and Pakistani Kashmir.  Finally it is safe, and in some way it is easier to negotiate than India.  The best dairy products I have eaten in my life, and probably it is the most populous country you have not yet seen, or maybe Nigeria, but that makes the list too.  Islamabad is nicer than any city in India, and watch the painter trucks on the nearby highway.

2. Eastern Bali.  Still mostly unspoilt, the perfect mix of exoticism and comfort.  This island is much, much more than Elizabeth Gilbert, yoga, and hippie candles.

3. Lalibela, Ethiopia.  Has some of my favorite churches, beautiful vistas and super-peaceful, and the high altitude of Lalibela and Addis means you don’t have to take anti-malarials.  I know a good guide there, here are my Lalibela posts.  the central bank forecasts 10.8% growth for the country for next year, so Lalibela is likely to change rapidly.

4. Lagos, Nigeria.  A bit dangerous, but immense fun, wonderful music every night, and not nearly as bad as you might be thinking.  Africa’s most dynamic city by far and a new modern civilization in the works.  Here are my earlier Lagos posts, including travel tips.

5. Odisha [Orissa], India.  Sometimes called India’s most underrated cuisine, that is enough reason to go and so now it is on my list for myself.

6. Sumatra, Indonesia.  Surely a good place to understand the evolution of Islam, and supposedly to be Indonesia’s best food.  I hope to get there soon.  First-rate textiles and lake views, I hear.

7. Warsaw, Poland.  No, not a fascist country (though objectionable in some regards), and rapidly becoming the center of opportunity for eastern Europe and a major player in the European Union.  First-rate food and dishes you won’t get elsewhere, at least nothing close to comparable quality.  Nice for walking, don’t expect too many intact old buildings, but isn’t it thrilling to see a major part of Europe growing at four percent?

8. Baku, Azerbaijan.  The world’s best seaside promenade, and wonderful textiles and food, in the Iranian direction, here are my travel notes.  Feels exotic, yet safe and orderly as well.

9. Macedonia, or anywhere off the beaten track in the former Yugoslavia.  Then think about the history and politics of where you are at, and then think about it some more.

10. Quito, Ecuador.  One of the world’s loveliest cities, including the church, wonderful potatoes and corn for vegetarians too.  There are some iPhone snatchers, but overall safe to visit.  Very good day trips as well, including to the “Indian market” at Otavalo and volcano Cotopaxi.

3. No, don’t read this piece, but in the meantime I will note I can no longer tell what is satire.

## The new culture of matching

When Jenica Andersen felt the tug for a second child at age 37, the single mom weighed her options: wait until she meets Mr. Right or choose a sperm donor and go it alone.

The first option didn’t look promising. The idea of a sperm donor wasn’t appealing, either, because she wanted her child to have an active father, just like her 4-year-old son has. After doing some research, Ms. Andersen discovered another option: subscription-based websites such as PollenTree.com and Modamily that match would-be parents who want to share custody of a child without any romantic expectations. It’s a lot like a divorce, without the wedding or the arguments.

Kris on Twitter asks that question.  I have a few hypotheses, none confirmed by any hard data, other than my “lyin’ eyes”:

1. Twitter exists as a kind of parallel truth/falsehood mechanism, and it is encroaching on traditional academic processes, for better or worse.

2. Hypotheses blaming people or institutions for failures and misdeeds will be more popular on Twitter than in academia, but over time they are spreading in academia too, in part because of their popularity on Twitter.  Blame makes for a more popular tweet.

3. Often the number of Twitter followers resembles a Power law, and thus Twitter raises the influence of very well known contributors.  Twitter also raises the influence of the relatively busy, compared to say the 2009 world where blogs held more of that influence.  Writing blog posts required more time than does issuing tweets.

4. I believe Twitter raises the relative influence of women.  For one thing, women can coordinate with each other on Twitter more easily than they can in academic life across different universities.

5. Twitter can damage the career prospects of some of the more impulsive tweeting white males.

6. On Twitter is is easier to judge people by their (supposed) intentions than in academia, so many more people will be accused of acting and writing in bad faith.

7. On Twitter more people do in fact act in bad faith.

8. Hardly anyone looks better on Twitter, so that contributes to the polarization of many professions, especially economics and those professions linked to political issues.  Top economists don’t seem so glamorous any more, not even in their areas of specialization.

9. Academic fields related to current events will rise in status and attention, and those topics will garner the Power law retweets.  Right now that means political science most of all but of course this will vary over time.