Category: Web/Tech
How well does current AI find errors in economics papers?
Can artificial intelligence (AI) refute economic theory? I document experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory, each containing an error that I helped identify or correct. ChatGPT Pro performed best, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. I argue that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.
That is from a new piece by Alexis Akira Toda.
New paper on the iPhone and fertility
The U.S. general fertility rate has fallen by 22% since 2007, a sustained decline not readily explained by economic conditions, contraceptive use, housing or childcare costs, or other commonly cited factors. We assess the potential role of a different shock: the diffusion of the smartphone. The U.S. rollout of the iPhone, the first modern smartphone, provides a natural experiment: from June 2007 through February 2011, the device was sold only on AT&T, allowing us to identify its effect from variation in AT&T’s mobile broadband coverage. Entropy-balanced Poisson and synthetic difference-in-differences event studies imply that access to the iPhone reduced births by 4.5–8.0% at ages 15–19 and 3.2–6.6% at ages 20–24, with statistically significant but smaller declines among older cohorts. Placebo analyses applied to Verizon and Sprint’s pre-2011 coverage footprint are null. Taken together, these cohort effects imply that the diffusion of the iPhone deepened the decline in births among women under 30 while suppressing the rise in births among older women. Overall, the diffusion of the iPhone explains 33–52% of the decline in the general fertility rate among women aged 15–44. National-survey evidence on time use and sexual behavior is consistent with the iPhone reducing in-person interactions, increasing pornography use, and reducing sexual frequency.
That is from
Note also that as this study is set up it does not discriminate against the ” the iPhone effect on fertility is mainly a thing of timing” hypothesis. And a Paul Novosad comment.
Might AI hurt corporate profits? (from my email)
From Clifford Sosin:
I loved your talk about AI and wanted to bounce an idea off you.
I think AI may be bad for corporate profit margins.
A lot of companies make money because their customers can’t be bothered to monitor them more closely, or to insource something. Customers let the company make some money in exchange for doing a decent-enough job and making the problem go away.
Bank of America has $2 trillion of deposits, not a penny of which is optimized. Most enterprise software vendors could be switched out far more often, or displaced by home-built software, but it’s too much of a pain. I could run a 12-party RFP for an Uber ride or a pair of socks, but I don’t.
In a sense, many professionals are an extension of the same idea. I could research my own real estate law, or my own insurance, whether business or personal, but I don’t because it would be too hard.
Google Search might be the biggest example. It makes money because advertisers know they need to be at the top of the results to be found. But my agent will happily search all the results across multiple search engines.
AI agents should change all this. By acting as incredibly rational and vigilant sourcing agents, CFOs, and experts for their users, they will take rents previously collected by these toll-takers and redistribute them to consumers.
And I don’t think the AI stack itself necessarily makes much profit. Commodity and open-weight models are hot on the heels of the major model companies, and competition in GPUs should intensify. Indeed, making a GPU is in some ways similar to making software, so perhaps it can commoditize substantially. Chip manufacturing may remain high-margin, but there are now plenty of entrants drawn in by the shortage who could make TSMC’s market more competitive over time.
Some companies will win. Low-cost providers may gain share as customers switch more often. Richer consumers may consume more high-end goods. Companies with genuinely advantaged business models and limited competition will be able to become more efficient. But my overriding sense is that the equilibrium outcome is lower margins for companies.
Of course, people will build new businesses, and maybe they will use AI to generate very high margins in ways I haven’t considered. That would prove me wrong.
But if this lower-margin hypothesis is true, the knock-on effects are probably positive for AI adoption, since it will make the models more popular with consumers.
And if your view is that AI drives GDP growth to be only 5–10% higher over the next decade, it’s possible that a 100–200 bp decline in corporate margins from roughly 12% would mean companies in aggregate don’t see much benefit — or in fact lose — even as consumers are better off.
Is work from home bad for your mental health?
From the “Results” section:
Relative to those in nonremotable jobs, workers in remotable jobs spent approximately one additional hour alone per workday after the pandemic. Those in remotable jobs also differentially increased days spent entirely alone and decreased after-work socializing. The rise in isolation was sharpest for those living alone, whose likelihood of spending the whole day without social contact rose by 7 percentage points (83%).
Mental distress simultaneously increased: Scores on the Kessler (K-6) measure of generalized psychological distress rose by 0.1 standard deviations for those in remotable jobs relative to those in nonremotable jobs. The increase in distress was roughly twice as large for those living alone compared with those living with family. Alternative measures of mental distress—such as the frequency of depression, mental health care utilization, and antidepressant prescriptions—show similar trends. In contrast, workers in remotable jobs did not differentially increase visits to non–mental health care providers or non–mental health prescriptions (statins, for example), suggesting that the change was not merely driven by increased flexibility for doctor visits.
That is from a recently published paper by Natalia Emanuel, Emma Harrington, and Amanda Pallais.
Barter markets in everything
A clean house in return for your data?:
We record first-person cleaning footage to help train the next generation of household robots. That data is valuable enough for us to offer cleaning services free of charge for a limited time.
Here is the link, via Glenn Mercer.
My twenty-minute AI talk for the Swedish company Sana
Law professors prefer AI over peer answers
Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth. Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test. We conducted a blinded evaluation of short-answer tutoring in contracts courses with sixteen U.S. law professors. Participants created 40 representative questions, wrote answers, and judged 2,918 anonymized comparisons between human and LLM responses. Professors rated LLMs far higher than their peers (average win rate = 75.33%), with models performing similarly to the best instructor. LLM responses were also rarely flagged as harmful (3.53%, vs 12.06% for professors). Preferences for LLM answers were consistent across evaluators and reflected shared professional standards. Our evaluation can be reliably extended to additional models by employing a separate LLM as a judge, rendering expert agreements an effective, scalable method to evaluate AI tutors in judgment-rich domains.
“far”. That is from a new paper by Alejandro Salinas, et.al. Via Andrew Curran. And via John Chamberlain:
Artificial intelligence (AI) and large language models (LLMs) tools are capable of mass-producing academic finance papers that are nearly indistinguishable from human-authored research, according to a new study published in the Journal of Economic Literature.
C’mon people, get ready. I know it is difficult to admit when your human capital has been devalued, but that time is upon us. In particular, being prolific is no longer such a comparative advantage in academia. You might run to the “but I know what questions to ask” cope, but I implore you to solve for the equilibrium. What is the equilibrium wage for merely asking questions?
Of course academic life and projects will continue, but the real rewards will go to people doing new, innovative, and hitherto impossible projects with AI.
The US Exports Intelligence
Most Americans work in the service sector so it’s not surprising that most export-related jobs are in the service sector (The U.S. exports about $2.2 trillion of goods and $1.2 trillion of services, but services are more labor intensive than manufacturing so they support more export jobs per dollar.)
Richard Baldwin writes:
In 2022, US service exports supported 8.9 million American jobs.
US manufacturing exports supported 2.2 million.
That’s four-to-one in favour of services. Yet in the national narrative, ‘export jobs’ almost always means things done in steel mills and factories.
…When a household in Germany pays for Netflix, that is an American export. When a Brazilian retailer buys Microsoft cloud capacity, that is an American export. When JPMorgan structures a financial deal in London, or an American consulting firm advises a company in Singapore, those are American exports too.
None of these is shipped in a container. No customs official records them as they clear the customshouse. Yet they are exports since they earn foreign income for America just as surely as the ‘Boeings, Beans and Beef’ that President Trump sold on his recent China trip.
Need I remind you that when OpenAI sells intelligence to people abroad, that is a US export? N.B. this is the future.
World trade in goods expanded roughly five-fold between 1990 and 2020. Trade in digitally enabled services expanded more than eleven-fold over the same period. These are the modern services.
The trade debate is fixated on manufacturing—where America is doing fine—while largely ignoring services, where America is crushing. Increasingly, our most valuable exports travel not on container ships but at the speed of light over fiber.
The returns to good data are rising
When we want A.I. to solve real problems for real people, we need to make sure the data exists. That means cleaning up government data sets that are currently in a shambles (a project that the province of Alberta’s government found A.I. could make much faster and easier). It may also mean funding the creation of novel data sets that could eventually give A.I. systems traction on scientific problems that are currently beyond our capability to solve. Those data sets — like the Protein Data Bank — would be public goods, and so would need to be funded by the public.
Here is a longer NYT column on AI from Ezra Klein. And this:
But much of the A.I. capacity will remain in the private sector. So a public agenda for A.I. should also give the private sector reason to work on public problems. Like in Operation Warp Speed, the government could define the outcomes it wants — a drug, a solution — and guarantee a market if it’s found and distributed equitably.
Negativism is not going to win in this sphere.
AI in gdp
- Quality-adjusted AI production in the United States grew at over 2,000 percent per year in 2024 and 2025, driven by three compounding forces: expanding data-center capacity, hardware efficiency gains, and—the largest of the three—algorithmic progress.
- Treating the AI sector as a coherent economic entity yields preliminary estimates of nominal AI GDP at approximately $250 billion in 2025, growing at roughly 2,600 percent per year in quality-adjusted real terms.
- National economic statistics accounts were not designed to track this kind of activity. Statistics agencies should begin developing AI-focused satellite accounts now, before the measurement gap becomes a policy gap.
Here is much more from Anton Korinek and Patrick McKelvey. Via the excellent Samir Varma.
Seven ways to avoid losing your job to AI
That is the theme of my latest Free Press column, here is one excerpt:
Principle five: Run experiments.
This is a more general version of the healthcare point. AI will generate so many new ideas and hypotheses, including for drugs and medical devices, but not only. Become a tester. Test new battery designs, new educational techniques, or new methods of conserving valued wildlife.
The demand for experiments will rise sharply, and most of those cannot be done by robots, at least not anytime soon.
Principle six: Gather data.
AI is a marvelous tool, but it relies on knowing lots about the world. That can stem from reading the internet, watching videos of people folding clothes, and hearing recordings of voices, among many other ways of absorbing information.
The more powerful the AI, the higher the returns from feeding it data, because it will make smart and useful inferences from those data. But most data in our world have never been put into AI models. Just consider corporate records, historical archives, referee reports for failed scientific papers, accounts of lab procedures, and much more. Most of that remains virgin territory.
The next few decades will bring an immense investment in feeding more data into the AIs. So there will be new jobs in gathering environmental data, job safety data, construction site data, corporate and management data, public health data, agricultural data, education data, and much more. Those jobs could be yours.
Recommended.
Robert Wright’s *The God Test*
The subtitle is Artificial Intelligence and Our Coming Cosmic Reckoning, due out June 23.
In the first chapter, Wright summarizes four of his perspectives, these are my paraphrases of his pp.5-6:
1. When it comes to AI, we should be somewhere on the awe spectrum.
2. We can create a future where the upside of AI far outweights the downside, though that involves steering human understanding toward the better side of the awe spectrum.
3. A major reorientation of human thought is required, and right now few people seem inclined to do that.
4. The worldviews of the current AI acclerationists and also doomers are not cosmic enough.
It is a good time for this book to be published, and I agree with much more of it than I disagree with. My main difference is that I am more focused on very small things — such as Rainier cherries and the forthcoming three to four hour Apichatpong movie — than on cosmic awe per se. For better or worse, I was not born with those genes, and unlike Wright I am far from Buddhism. I do think there will be a transformation of “observed awe,” and I am somewhat worried that it will not go well. Will we be good at building a fairly new world, if not from scratch, on the basis of some new premises about what is possible and what is not? I will in any case interpret the pending transformation through a Straussian lens, namely thinking that a lot of the observed transformation of awe will be about something other than what people are claiming. It will be about people arguing over relative status, but under different guises. Not as tasty as a good Rainier cherry, but interesting to follow as well.
But are we still good at steering and evolving grand visions? Christianity and the Enlightenment are a hard act to follow.
What should I ask Chase Koch?
Yes I will be doing a Conversation with him. Chase and Charles Koch have a new book out, namely
Robin (it’s happening)
Scientific discovery is driven by the iterative process of observation, hypothesis generation, experimentation, and data analysis. Despite recent advancements in applying artificial intelligence to biology, no system has yet automated all these stages [1, 2, 3]. Here, we introduce Robin, the first multi-agent system capable of fully automating both hypothesis generation and data analysis for experimental biology. By integrating literature search agents with data analysis agents, Robin can generate hypotheses, propose experiments, interpret experimental results, and generate updated hypotheses, achieving a semi-autonomous approach to scientific discovery. By applying this system, we were able to identify promising therapeutic candidates for dry age-related macular degeneration (dAMD), the major cause of blindness in the developed world [4, 5]. Robin proposed enhancing retinal pigment epithelium phagocytosis as a therapeutic strategy, and identified and confirmed in vitro efficacy for ripasudil and KL001. Ripasudil is a clinically-used Rho kinase (ROCK) inhibitor that has never previously been proposed for treating dAMD. To elucidate the mechanism of ripasudil-induced upregulation of phagocytosis, Robin then proposed and analyzed a follow-up RNA-seq experiment, which revealed upregulation of ABCA1, a lipid efflux pump and possible novel target. All hypotheses, experimental directions, data analyses, and data figures in the main text of this report were produced by Robin. As the first AI system to autonomously discover and validate novel therapeutic candidates within an iterative lab-in-the-loop framework, Robin establishes a new paradigm for AI-driven scientific discovery.
Here is the full article from Nature. And here are two other new Nature pieces on related topics.
John Burn-Murdoch on phones and fertility
From my email:
Hi folks, appreciate the discussion of the piece here, as ever.
I just wanted to chime in briefly with an analogy that speaks to one of the ways I think about the causal mechanism here, and to my mind pushes back against the argument that since past declines in fertility didn’t come from smartphones etc the current decline can’t either.
• In the past, weight loss generally came from sustained dieting and exercise
• Now it overwhelmingly comes from injecting GLP-1s
• In the same way that GLP-1s are a technological shock that amplifies/accelerates the old mechanism (eating less), social media is a technological shock that amplifies/accelerates the old mechanism (cultural change)To my mind one of the ways (possibly the main way) that phones and social media could be affecting fertility is by accelerating and internationalising pre-existing trends of cultural change. One example could be young women’s sense of empowerment and independence, which was on the rise in many parts of the world but has sped up over the past decade or two (I would point to my previous work on the ideological gender divide as one piece of evidence here) and has spread rapidly to regions and cultures that were surely very unlikely to reach this point without exposure to western social media.
Thoughts?
I will add one point on this debate, noting I do not think it runs counter to Burn-Murdoch. Some commentators are insisting that what really matters is how many children survive to adulthood, not how many are born in the first place. But both numbers matter a good deal. Every time a woman gets pregnant she incurs significant costs, especially in older times when death in childbirth was common, or even death or health problems from a miscarriage were a much greater risk. Furthermore, if you tried for seven kids, but only expected three or four to survive, a lot of times more than three or four survived. So general survival of all or almost all your children had to be a palatable option, even if the expected value was lower than that.