Category: Data Source

Is academic writing getting harder to read?

To track academic writing over time, The Economist analysed 347,000 PhD abstracts published between 1812 and 2023. The dataset was produced by the British Library and represents a majority of English-language doctoral theses awarded by British universities. We reviewed each abstract using the Flesch reading-ease test, which measures sentence and word length to gauge readability. A score of 100 roughly indicates passages can be understood by someone who has completed fourth grade in America (usually aged 9 or 10), while a score lower than 30 is considered very difficult to read.  An average New York Times article scores around 50 and a CNN 
article around 70. This article scores 41…

We found that, in every discipline, the abstracts have become harder to read over the past 80 years. The shift is most stark in the humanities and social sciences (see chart), with average Flesch scores falling from around 37 in the 1940s to 18 in the 2020s. From the 1990s onwards, those fields went from being substantially more readable than the natural sciences—as you might expect—to as complicated. Ms Louks’s abstract had a reading-ease rating of 15, still more readable than a third of those analysed in total.

Here is more from The Economist, via the excellent Samir Varma.

The importance of transportation for productivity

We quantify the aggregate, regional and sectoral impacts of transportation productivity growth on the US economy over the period 1947-2017. Using a multi-region, multi-sector model that explicitly captures produced transportation services as a key input to interregional trade, we find that the calibrated change in transportation productivity had a sizable impact on aggregate welfare, magnified by a factor of 2.3 compared to its sectoral share in GDP. The amplification mechanism results from the complementarity between transport services and tradable goods, interacting with sectoral and spatial linkages. The geographical implications are highly uneven, with the West and Southwest benefiting the most from market access improvements while the Northeast experiences a decline. Sectoral impacts are largest in transportation-intensive activities like agriculture, mining and heavy manufacturing. Our results demonstrate the outsized and heterogeneous impact of the transportation sector in shaping US economic activity through specialization and spatial transformation.

That is from a recent NBER working paper by A. Kerem Coşar, Sophie Osotimehin & Latchezar Popov.

Unconventional Indicators of National Aspiration

What are your top indicators of national aspiration? Percentage of GDP devoted to R&D would be a good conventional indicator. What about some unconventional indicators? My top five:

1) Top marginal tax rate
2) Space Program
3) Distance to travel to mother’s home
4) Tallest statue
5) Cultural exports

On these, the US and India perform well. India leads on tallest statue and its space program is impressive for a developing country. Cultural exports are currently low but historically high–I would not be surprised at a rebound. A lot of eastern European countries such as Hungary and Romania have flat taxes with top rates of 10-15%. Israel has a space program.

I am always surprised by how little people tend to move from the family home. In the US:

…80% of young adults migrate less than 100 miles from where they grew up. 90% migrate less than 500 miles. Migration distances are shorter for Black and Hispanic individuals and for those from low-income families

If anything this seems to be down in the US despite the much greater ease of moving today than in the past.

Your unconventional indicator?

Hat tip: Connor.

Why are Top Scientists Leaving Harvard?

Harvard magazine has an excellent interview with three scientists, Michael Mina, Douglas Melton and Stuart Schreiber, all highly regarded in their fields of life sciences, who have recently left Harvard for the private sector.

Why did they leave? Mina tells an incredible story of what happened during the pandemic. At the time Mina was a faculty member at the Chan School of Public Health, he is extremely active in advising governments on the pandemic, and he brings Harvard millions of dollars a year in funding. But when he tries to hire someone at his lab, the university refuses because there is hiring freeze! Sorry, no hiring for pandemic research during a pandemic. In my talk on US Pandemic Policy I discuss the similar failure of the Yale School of Public Health and how miraculously and absurdly Tyler stepped in to save the day. The rot is deep.

Melton also notes the difference in speed of response between the public and private/commercial sector:

Polls have shown that principal investigator biologists now spend up to 40 percent of their time—it’s a shocking number, 40 percent of their time—writing grants.

In industry, the funding allows for very rapid change. There’s no writing a grant and waiting six months to see if it could get funded, and then waiting another six months for the university to make arrangements to receive the funds. The speed with which you can move into a new area is not comparable.

Years ago, the pharmaceutical industry rarely did discovery research. But now, pharmaceutical companies do basic science. That’s been a good shift, in my opinion, but it’s been a shift.

“The computational resources, the sequencing, the chemical screening— it’s not comparable to what we can do in any university.”

Everything gets done much quicker. For example, when you want to file for a patent at a company, the next morning there are two patent attorneys in your office ready to write that patent. The computational resources, the sequencing, the chemical screening— it’s not comparable to what we can do in any university. It’s a whole order of magnitude different.

Our last hire at GMU took well over a year to complete. It’s outrageous. There are no functional reasons why universities should be so slow. Don’t forget, Harvard has an endowment of $50 billion!

Melton also asks whether a new private-public partnership model is possible:

Why can’t we find a way—since many of our undergraduates and graduates will end up working in industry—why can’t we find a way for them to do their studies and their Ph.D. and their postdoctoral work in conjunction with Harvard, with MIT, and with Vertex? There are reasons for that, but we haven’t been imaginative enough to think about a compromise.

Hat tip: R.P.

Thomas Storrs on elastic data supply (from my email)

Regarding your post yesterday Are LLMs running out of data?, the National Archives has 13.5 billion pieces of paper and only 240 million are digitized. Some of this is classified or otherwise restricted, but surely we can do better than less than 2%.

NARA says they aim to increase this to 500 million by the end of 2026. Even with an overly generous $1/record estimate, it makes sense to me for someone to digitize much of the remaining 13 billion though the incentives are tricky for private actors. Perhaps a consortium of AI companies could make it work. It’s a pure public good so I would be happy with a federal appropriation.

Admittedly, I have a parochial interest in specific parts’ being digitized on mid-20th century federal housing policy. Nonetheless, supply of data for AI is obviously elastic and there’s some delicious low-hanging fruit available.

The National Archives are probably the biggest untapped source of extant data. There are hundreds of billions of more pages around the world though.

Technological Disruption in the US Labor Market

Deming, Ong and Summers have a good overview of long-run and very recent changes in the US labor market. Using a measure of occupational titles the authors find:

The years spanning 1990-2017 were the most stable period in the history of the US labor market, going back nearly 150 years.

It’s a bit too early to distinguish an AI revolution from a COVID shock but the last four years look to be more disruptive than any since the 1970s and over a slightly longer period there are trends including a decline in retail, as consumers shift to online shopping and delivery, and a decline in office work, the latter especially suggesting an AI effect:

There were 850,000 fewer retail sales workers in the US in 2023 compared to 2013 even though the US economy added more than 19 million jobs over this period.

There are nearly five hundred thousand fewer secretaries and administrative assistants in the US labor force now than there were a decade ago. At the same time, management and business occupations have grown very rapidly. There were four million more managers and 3.5 million more business and financial operations jobs in the US in 2023 than there were in 2013.

Keep in mind that these changes are occurring as employment and wages overall are rising.

The Effects of Gender Integration on Men

Evidence from the U.S. military:

Do men negatively respond when women first enter an occupation? We answer this question by studying the end of one of the final explicit occupational barriers to women in the U.S.: in 2016, the U.S. military opened all positions to women, including historically male-only combat occupations. We exploit the staggered integration of women into combat units to estimate the causal effects of the introduction of female colleagues on men’s job performance, behavior, and perceptions of workplace quality, using monthly administrative personnel records and rich survey responses. We find that integrating women into previously all-male units does not negatively affect men’s performance or behavioral outcomes, including retention, promotions, demotions, separations for misconduct, criminal charges, and medical conditions. Most of our results are precise enough to rule out small, detrimental effects. However, there is a wedge between men’s perceptions and performance. The integration of women causes a negative shift in male soldiers’ perceptions of workplace quality, with the effects driven by units integrated with a woman in a position of authority. We discuss how these findings shed light on the roots of occupational segregation by gender.

That is all from Kyle Greenberg, Melanie Wasserman & E. Anna Weber.

Gender Composition and Group Behavior

Evidence from city councils:

How does gender composition influence individual and group behavior? To study this question empirically, we assembled a new, national sample of United States city council elections and digitized information from the minutes of over 40,000 city-council meetings. We find that replacing a male councilor with a female councilor results in a 25p.p. increase in the share of motions proposed by women. This is despite causing only a 20p.p. increase in the council female share. The discrepancy is driven, in part, by behavioral changes similar to those documented in laboratory-based studies of gender composition. When a lone woman is joined by a female colleague, she participates more actively by proposing more motions. The apparent changes in behavior do not translate into clear differences in spending. The null finding on spending is not driven by strategic voting; however, preference alignment on local policy issues between men and women appears to play an important role. Taken together, our results both highlight the importance of nominal representation for cultivating substantive participation by women in high-stakes decision making bodies; and also provide evidence in support of the external validity of a large body of laboratory-based work on the consequences of group gender composition.

That is from a new NBER working paper by milia Brito Rebolledo, Jesse Bruhn, Thea How Choon & E. Anna Weber.  Those results are also consistent with my anecdotal observations.

Using AI to analyze changes in pedestrian traffic

That is the topic of my latest Bloomberg column, here is one bit:

Fortunately, there is new research. We have entered the age where innovative methods of measurement, such as computer vision and deep learning, can reveal how American life has changed.

Researchers at the National Bureau of Economic Research compiled footage of four urban public spaces, two in New York and one each in Philadelphia and Boston, from 1979-1980 and again in 2008-2010. These snapshots of American life, roughly 30 years apart, reveal how changes in work and culture might have shaped the way people move and interact on the street.

The videos capture people circulating in two busy Manhattan locations, in Bryant Park in midtown and outside the Metropolitan Museum of Art on the Upper East Side; around Boston’s Downtown Crossing shopping district; and on Chestnut Street in downtown Philadelphia. One piece of good news is that at least when it comes to our street behavior, we don’t seem to have become more solitary. From 1980 to 2010 there was hardly any change in the share of pedestrians walking alone, rising from 67% to 68%.

A bigger change is that average walking speed rose by 15%. So the pace of American life has accelerated, at least in public spaces in the Northeast. Most economists would predict such a result, since the growth in wages has increased the opportunity cost of just walking around. Better to have a quick stroll and get back to your work desk.

The biggest change in behavior was that lingering fell dramatically. The amount of time spent just hanging out dropped by about half across the measured locations. Note that this was seen in places where crime rates have fallen, so this trend was unlikely to have resulted from fear of being mugged. Instead, Americans just don’t use public spaces as they used to. These places now tend to be for moving through, to get somewhere, rather than for enjoying life or hoping to meet other people. There was especially a shift at Boston’s Downtown Crossing. In 1980, 54% of the people there were lingering, whereas by 2010 that had fallen to 14%.

Consistent with this observation, the number of public encounters also fell. You might be no less likely to set off with another person in tow, but you won’t meet up with others as often while you are underway. The notion of downtown as a “public square,” rife with spontaneous or planned encounters, is not what it used to be.

I prefer the new arrangements, but of course not everybody does.  The researchers are Arianna Salazar-MirandaZhuangyuan FanMichael B. BaickKeith N. HamptonFabio DuarteBecky P.Y. LooEdward L. Glaeser Carlo Ratti.

Some Simple Economics of the Google Antitrust Case

The case is straightforward: Google pays firms like Apple billions of dollars to make its search engine the default. (N.B. I would rephrase this as Apple charges Google billions of dollars to make its search engine the default–a phrasing which matters if you want to understand what is really going on. But set that aside for now.) Consumers, however, can easily switch to other search engines by downloading a different browser or changing their default settings. Why don’t they? Because the minor transaction costs are not worth the effort. Moreover, if Google provides the best search experience, most users have no incentive to switch.

Consequently, any potential harm to consumers is limited to minor switching costs, and any remedies should be proportionate. Proposals such as forcing Google to divest Chrome or Android are vastly disproportionate to the alleged harm and risk being counterproductive. Google’s Android has significantly increased competition in the smartphone market, and ChromeOS has done the same for laptops. Google has invested billions in increasing competition in its complements. Google was able to make these investments because they paid off in revenue to Google Search and Google Ads. Kill the profit center and kill the incentive to invest in competition. Unintended consequences.

I argued above that consumer harm is limited to minor switching costs. The plaintiffs’ counter-argue that Google’s purchase of default-status “forecloses” competitors from achieving the scale necessary to compete effectively. This argument relies on network effects – more searches improve search quality through better data. However, this creates a paradox: on the theory of the plaintiffs, two or three firms each operating at smaller scale is worse than one operating at large scale. For the plaintiffs’ argument to hold, it must be shown that we are at the exact sweet spot where the benefits of increased competition in lowering the price of advertising outweighs the efficiency losses to consumer search quality from reduced scale. Yet, there is no evidence in the case—nor even an attempt—to demonstrate that we are at such a sweet spot.

Or perhaps the argument is that with competition we would get even better search but that argument can’t be right because the costs of switching, as noted above, are bounded by some minor transaction costs. Thus, if a competitor could offer better search, it would easily gain scale (e.g. AI search, see below).

Traditional foreclosure analysis requires showing both substantial market closure and consumer harm. Given the ease of switching and the complex relationship between scale and search quality, proving such harm becomes challenging and my read is that the plaintiffs didn’t prove harm.

In my view, the best analogy to the Google antitrust case is Coke and Pepsi battling for shelf space in the supermarket. Returning to my earlier parenthetical point, Apple is like the supermarket charging for prime shelf placement. Is this a significant concern? Not really. Eye-level placement matters to Coke and Pepsi but by construction they are competing for consumers who don’t much care which sugary, carbonated beverage they consume. For consumers who do care, the inconvenience is limited to reaching to a different shelf.

To add insult to injury, the antitrust case is happening when Google is losing advertising share and is under pressure from a new search technology, Artificial Intelligence. AI search from OpenAI, Anthropic, Meta Llama, and xAI is very well funded and making rapid progress. Somehow AI search did manage to achieve scale! As usual, the market appears more effective than the antitrust authorities at creating competition. 

Addendum: Admittedly this is outside the remit of the judge, but the biggest anti-competitive activities in tech are probably government policies that slow the construction of new power generation, new power lines, new data centers, and deals between power generators and data centers. I’d prefer the government take on its own anticompetitive effects before going after extremely succesful tech companies that have clearly made consumers much better off.

Literacy Rates and Simpson’s Paradox

Max T. at Maximum Progress shows that between 1992 and 2003 US literacy rates fell dramatically within every single educational category but the aggregate literacy rate didn’t budge.  A great example of Simpson’s Paradox! The easiest way to see how this is possible is just to imagine that no one’s literacy level changes but everyone moves up an educational category. The result is zero increase in literacy but falling literacy rates in each category.

Two interesting things follow. First, this is very suggestive of credentialing and the signaling theory of education. Second, and more originally, Max suggests that total factor productivity is likely to have been mismeasured.  Total factor productivity tells us how much more output can we get from the same inputs. If inputs increase, we expect output to increase so to measure TFP we must subtract any increase in output due to greater inputs. It’s common practice, however, to use educational attainment as a (partial) measure of skill or labor quality. If educational attainment is just rising credentialism, however, then this overestimates the increase in output due to labor skill and underestimates the gain to TFP.

This does not imply that we are richer than we actually are–output is what it is–but it does imply that if we want to know why we haven’t grown richer as quickly as we did in the past we should direct less attention to ideas and TFP and more attention to the failure to truly increase human capital.

Unauthorized Immigration and Local Government Finances

This paper examines how unauthorized immigration affects the fiscal health of local governments in the United States. Using detailed data on unauthorized immigrants’ countries of origin and arrival dates from the Syracuse TRAC database, we isolate immigration flows driven by social, economic, and political conditions in source countries. We predict local immigration patterns using a shift-share instrument based on pre-existing foreign-born population distributions. We find that the economic effects of unauthorized immigration depend crucially on local labor market conditions. In areas with structurally tight labor markets—characterized by low unemployment and low labor force participation—unauthorized immigration correlates with lower municipal bond yields. However, areas with typical labor market conditions experience higher yields. Areas with “sanctuary” status also experience higher yields when exposed to unauthorized immigration. These yield effects reflect underlying economic mechanisms: unauthorized immigration predicts loosening of labor markets in areas where they were previously tight, whereas in sanctuary areas, unemployment rates increase by more than twice as much. We find unauthorized immigration explains higher expenditures on local public amenities, including welfare assistance, construction, education, and law enforcement, but these expenditures are not offset by higher tax revenues. Overall, our results provide novel insights into the local economic effects of unauthorized immigration.

That is from a new paper by Jess Cornaggia, Kimberly Cornaggia, and Ryan D. Israelsen.  Via the excellent Kevin Lewis.