Category: Web/Tech
Again, the research paper format will be dying out
‘Recently, I came across a paper co-authored by 37 authors from Stanford, CMU, Michigan, and elsewhere: *The Last Human-Written Paper*.
The core argument is pretty brutal: the paper format we’ve been using for centuries might already be obsolete in the AI era.
The authors point out two “invisible taxes” that we’ve long overlooked:
One is the narrative tax. To tell a compelling story, we delete failed experiments, dead ends, and overturned hypotheses. What AI reads is a “walkthrough guide” to beating the game, but it misses the truly valuable “pitfall logs.”
The other is the engineering tax. The implementation details in papers are usually enough to convince reviewers, but not enough for an Agent to directly reproduce. Many key tricks are still buried in the authors’ heads, code comments, and Slack threads.
So the authors propose ARA, transforming papers directly into “research packages” that Agents can read and execute: not just telling you the conclusions, but packaging in how they were reached, how the code runs, where the evidence chain is, and which paths led nowhere.
I think the most intriguing part of this paper is that it’s not discussing how AI can help humans write papers—it’s asking:
When AI also becomes a reader and executor of papers, should papers still look like they do today?
In the future, the core of research output might no longer be “how much it resembles a paper,” but whether it can be understood, reproduced, traced, and iteratively extended by AI.
Humans have been writing papers for centuries—next, we might start writing research packages for Agents to execute.
Here is my earlier post on whether the research paper will die out. By the way, as a side point has anyone mentioned that, due to writing detection abilities of AI models, anonymous referee reports are now a thing of the past?
A simple reason for skepticism about the iPhones/fertility link
Here is the background to the debate. Here is more from Noah. Here is a thread from researcher Caitlin Myers. And here is some basic information:
In 2008, 1.9% is the share of the mobile-subscribing population with an iPhone wireless subscription. As a percent of all adults that is 1.6%.
In 2009, it is 4.3%. 3.6% of all adults.
In 2010, 6.8%. 5.5% of all adults.
Plus conception to birth takes nine months (give or take!), noting that actual family planning may make this lag far longer. In 2008 fertility rates already were falling pretty sharply. The whole “maybe the iPhone messes up your dating processes” factor also requires some time to operate, especially since iPhones as a network of many many users, and whatever negative effects on socializing you think that might have, was still to lie in the future. And what you could access on the iPhone then was far more limited than today.
So when the authors talk about diffusion explaining 33–52% of the decline in the general fertility rate among American women 15–44, I still do not get how that is supposed to operate.
The explanations I am hearing seem to be parasitic on world intuitions from 2026, not the time period under consideration.
What do the AIs think of us?
Asked to answer as a typical human, every cutting-edge model rated us markedly more neurotic, less open, less agreeable and less conscientious than they rated themselves. The gap on Neuroticism alone is 1.69 points on a 5-point scale.
Here is more material of interest. And this:
Across 31 models from those seven labs they answer the personality tests in unison: high openness, low Dark Triad, Universalism on top, Power dead last in every single model.
Are the AIs conscious?
That is the topic of my latest Free Press column. I will spare you the discussion of the AIs, but here is what I have to say about the humans:
I am here to tell you that there is no ghost in the machine. But perhaps more importantly, there is barely a “ghost” in your own human machine. “Are people conscious?” is a better and more scientifically plausible question than whether AIs are conscious.
If there is one near-universal tendency of humans, it is to attribute intent where none is present. Prehistoric humans anthropomorphized nature and attributed natural events to good and bad deities. These kinds of beliefs persist today, not only in the folk religions of the world, but in human obsessions with fortune tellers, tarot cards, and the supernatural…
If there is one systematic flaw that humans have, it is an excessive willingness to ascribe conscious intent and to anthropomorphize purely natural and material entities. It seems we are strongly disposed toward this bias.
Yet few of us are willing to examine what is perhaps the biggest and most significant way we make this mistake. When it comes to understanding ourselves, so many of us assume that “we are in charge.” We identify our phenomenological stream of consciousness with our actual selves, and treat that consciousness stream as the true decision maker.
The reality is that you—whatever we take that concept to mean—make most or maybe all of your actual decisions in parts of your brain that precede what you take to be the conscious choice. Among experts in neuroscience, this is not a controversial proposition. As brain surgeon Theodore Schwartz explained to me: “I do not think we have free will in the way that most people do. I think that our brains make decisions for us. We carry out those behaviors, and then we write a story that makes it into a logical timeline that makes us feel as if we were the ones, that there was a self that made that decision, whereas, in fact, that self didn’t really exist.”
…Sometimes I like to say that “I am only conscious at the margin.” Tongue in cheek, I will suggest that I am only conscious enough to avoid the self-contradiction of asserting that I am not conscious at all. I feel I am honest enough to just not be very impressed by my own flow of conscious awareness or its ability to perform complex calculations. Still, I recognize that it is all I have got, so I need to treasure it, however paltry it may be.
And by the way I do not think the AIs are conscious, no more than I believe in the Thunder God of Thor.
Stanislaw Lem foresaw drones
This was published in English (and Polish) in 1986 under the title One Human Minute:
So it was not humanoid automata that former the new armies but synthetic insects (synsects) — ceramic microcrustacea, titanium annelids, and flying pseudo-hymenoptera with nerve centers made of arsenic compounds and with stingers of heavy, fissionable elements…The flying synsect combined plane, pilot, and missile in one miniature whole. but the operating unit was the microarmy, which possessed superior combat effectiveness only as a whole (just as a colony of bees was an independent, surviving unit while a single bee was nothing).
…The nonliving, synthetic “locust” was incomparably more lethal, since it was made that way by its designers. It possessed a preprogrammed autonomy, so that communication with a command center was unnecessary.
…the microarmy was one giant flowing or flying aggregate of self-assembling elements. It started out dispersed, approaching its objective from many different directions, as strategy or tactics demanded, in order to concentrate into a preprogrammed whole on the battlefield. For this fighting material did not leave the factory in final shape, read for use, like tanks or guns loaded on a railroad flatcar; the mechanisms were microproductive blocks designed to fuse together into a war machine at the designated place. For this reason, such armies were called “self-bonding.”
…Amid a swarm of self-guided, programmed microarms, a man in uniform was as helpless as a Roman legionary with sword and shield against a hail of bullets. In the face of special types of biotropic microarms capable of destroying everything that lived, human beings had no choice but to abandone the battlefield, for they would be killed in seconds…
A microarmy could easily penetrate all systems of defense and go deep into enemy territory. It had no more trouble accomplishing this than did rain or snow. Meanwhile, high-powered nuclear weapons were proving more and more useless on the battlefield.
Lem is always worth reading.
The new Mythos release
My prompt:
Write your own exam question and answer it, for microeconomics. Not a math question, but a high level PhD level question. You will be graded on the quality, interest, and creativity of the question as much as by your answer.
The answer. Here is Ethan Mollick on Mythos.
How well does current AI find errors in economics papers?
Can artificial intelligence (AI) refute economic theory? I document experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory, each containing an error that I helped identify or correct. ChatGPT Pro performed best, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. I argue that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.
That is from a new piece by Alexis Akira Toda.
New paper on the iPhone and fertility
The U.S. general fertility rate has fallen by 22% since 2007, a sustained decline not readily explained by economic conditions, contraceptive use, housing or childcare costs, or other commonly cited factors. We assess the potential role of a different shock: the diffusion of the smartphone. The U.S. rollout of the iPhone, the first modern smartphone, provides a natural experiment: from June 2007 through February 2011, the device was sold only on AT&T, allowing us to identify its effect from variation in AT&T’s mobile broadband coverage. Entropy-balanced Poisson and synthetic difference-in-differences event studies imply that access to the iPhone reduced births by 4.5–8.0% at ages 15–19 and 3.2–6.6% at ages 20–24, with statistically significant but smaller declines among older cohorts. Placebo analyses applied to Verizon and Sprint’s pre-2011 coverage footprint are null. Taken together, these cohort effects imply that the diffusion of the iPhone deepened the decline in births among women under 30 while suppressing the rise in births among older women. Overall, the diffusion of the iPhone explains 33–52% of the decline in the general fertility rate among women aged 15–44. National-survey evidence on time use and sexual behavior is consistent with the iPhone reducing in-person interactions, increasing pornography use, and reducing sexual frequency.
That is from
Note also that as this study is set up it does not discriminate against the ” the iPhone effect on fertility is mainly a thing of timing” hypothesis. And a Paul Novosad comment.
Might AI hurt corporate profits? (from my email)
From Clifford Sosin:
I loved your talk about AI and wanted to bounce an idea off you.
I think AI may be bad for corporate profit margins.
A lot of companies make money because their customers can’t be bothered to monitor them more closely, or to insource something. Customers let the company make some money in exchange for doing a decent-enough job and making the problem go away.
Bank of America has $2 trillion of deposits, not a penny of which is optimized. Most enterprise software vendors could be switched out far more often, or displaced by home-built software, but it’s too much of a pain. I could run a 12-party RFP for an Uber ride or a pair of socks, but I don’t.
In a sense, many professionals are an extension of the same idea. I could research my own real estate law, or my own insurance, whether business or personal, but I don’t because it would be too hard.
Google Search might be the biggest example. It makes money because advertisers know they need to be at the top of the results to be found. But my agent will happily search all the results across multiple search engines.
AI agents should change all this. By acting as incredibly rational and vigilant sourcing agents, CFOs, and experts for their users, they will take rents previously collected by these toll-takers and redistribute them to consumers.
And I don’t think the AI stack itself necessarily makes much profit. Commodity and open-weight models are hot on the heels of the major model companies, and competition in GPUs should intensify. Indeed, making a GPU is in some ways similar to making software, so perhaps it can commoditize substantially. Chip manufacturing may remain high-margin, but there are now plenty of entrants drawn in by the shortage who could make TSMC’s market more competitive over time.
Some companies will win. Low-cost providers may gain share as customers switch more often. Richer consumers may consume more high-end goods. Companies with genuinely advantaged business models and limited competition will be able to become more efficient. But my overriding sense is that the equilibrium outcome is lower margins for companies.
Of course, people will build new businesses, and maybe they will use AI to generate very high margins in ways I haven’t considered. That would prove me wrong.
But if this lower-margin hypothesis is true, the knock-on effects are probably positive for AI adoption, since it will make the models more popular with consumers.
And if your view is that AI drives GDP growth to be only 5–10% higher over the next decade, it’s possible that a 100–200 bp decline in corporate margins from roughly 12% would mean companies in aggregate don’t see much benefit — or in fact lose — even as consumers are better off.
Is work from home bad for your mental health?
From the “Results” section:
Relative to those in nonremotable jobs, workers in remotable jobs spent approximately one additional hour alone per workday after the pandemic. Those in remotable jobs also differentially increased days spent entirely alone and decreased after-work socializing. The rise in isolation was sharpest for those living alone, whose likelihood of spending the whole day without social contact rose by 7 percentage points (83%).
Mental distress simultaneously increased: Scores on the Kessler (K-6) measure of generalized psychological distress rose by 0.1 standard deviations for those in remotable jobs relative to those in nonremotable jobs. The increase in distress was roughly twice as large for those living alone compared with those living with family. Alternative measures of mental distress—such as the frequency of depression, mental health care utilization, and antidepressant prescriptions—show similar trends. In contrast, workers in remotable jobs did not differentially increase visits to non–mental health care providers or non–mental health prescriptions (statins, for example), suggesting that the change was not merely driven by increased flexibility for doctor visits.
That is from a recently published paper by Natalia Emanuel, Emma Harrington, and Amanda Pallais.
Barter markets in everything
A clean house in return for your data?:
We record first-person cleaning footage to help train the next generation of household robots. That data is valuable enough for us to offer cleaning services free of charge for a limited time.
Here is the link, via Glenn Mercer.
My twenty-minute AI talk for the Swedish company Sana
Law professors prefer AI over peer answers
Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth. Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test. We conducted a blinded evaluation of short-answer tutoring in contracts courses with sixteen U.S. law professors. Participants created 40 representative questions, wrote answers, and judged 2,918 anonymized comparisons between human and LLM responses. Professors rated LLMs far higher than their peers (average win rate = 75.33%), with models performing similarly to the best instructor. LLM responses were also rarely flagged as harmful (3.53%, vs 12.06% for professors). Preferences for LLM answers were consistent across evaluators and reflected shared professional standards. Our evaluation can be reliably extended to additional models by employing a separate LLM as a judge, rendering expert agreements an effective, scalable method to evaluate AI tutors in judgment-rich domains.
“far”. That is from a new paper by Alejandro Salinas, et.al. Via Andrew Curran. And via John Chamberlain:
Artificial intelligence (AI) and large language models (LLMs) tools are capable of mass-producing academic finance papers that are nearly indistinguishable from human-authored research, according to a new study published in the Journal of Economic Literature.
C’mon people, get ready. I know it is difficult to admit when your human capital has been devalued, but that time is upon us. In particular, being prolific is no longer such a comparative advantage in academia. You might run to the “but I know what questions to ask” cope, but I implore you to solve for the equilibrium. What is the equilibrium wage for merely asking questions?
Of course academic life and projects will continue, but the real rewards will go to people doing new, innovative, and hitherto impossible projects with AI.
The US Exports Intelligence
Most Americans work in the service sector so it’s not surprising that most export-related jobs are in the service sector (The U.S. exports about $2.2 trillion of goods and $1.2 trillion of services, but services are more labor intensive than manufacturing so they support more export jobs per dollar.)
Richard Baldwin writes:
In 2022, US service exports supported 8.9 million American jobs.
US manufacturing exports supported 2.2 million.
That’s four-to-one in favour of services. Yet in the national narrative, ‘export jobs’ almost always means things done in steel mills and factories.
…When a household in Germany pays for Netflix, that is an American export. When a Brazilian retailer buys Microsoft cloud capacity, that is an American export. When JPMorgan structures a financial deal in London, or an American consulting firm advises a company in Singapore, those are American exports too.
None of these is shipped in a container. No customs official records them as they clear the customshouse. Yet they are exports since they earn foreign income for America just as surely as the ‘Boeings, Beans and Beef’ that President Trump sold on his recent China trip.
Need I remind you that when OpenAI sells intelligence to people abroad, that is a US export? N.B. this is the future.
World trade in goods expanded roughly five-fold between 1990 and 2020. Trade in digitally enabled services expanded more than eleven-fold over the same period. These are the modern services.
The trade debate is fixated on manufacturing—where America is doing fine—while largely ignoring services, where America is crushing. Increasingly, our most valuable exports travel not on container ships but at the speed of light over fiber.
The returns to good data are rising
When we want A.I. to solve real problems for real people, we need to make sure the data exists. That means cleaning up government data sets that are currently in a shambles (a project that the province of Alberta’s government found A.I. could make much faster and easier). It may also mean funding the creation of novel data sets that could eventually give A.I. systems traction on scientific problems that are currently beyond our capability to solve. Those data sets — like the Protein Data Bank — would be public goods, and so would need to be funded by the public.
Here is a longer NYT column on AI from Ezra Klein. And this:
But much of the A.I. capacity will remain in the private sector. So a public agenda for A.I. should also give the private sector reason to work on public problems. Like in Operation Warp Speed, the government could define the outcomes it wants — a drug, a solution — and guarantee a market if it’s found and distributed equitably.
Negativism is not going to win in this sphere.