The Madmen and the AIs
In Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance Harang Ju and Sinan Aral (both at MIT) paired humans and AIs in a set of marketing tasks to generate some 11,138 ads for a large think tank. The basic story is that working with the AIs increased productivity substantially. Important, but not surprising. But here is where it gets wild:
[W]e manipulated the Big Five personality traits for each AI, independently setting them to high or low levels using P2 prompting (Jiang et al., 2023). This allows us to systematically investigate how AI personality traits influence collaborative work and whether there is heterogeneity in their effects based on the personality traits of the human collaborators, as measured through a pre-task survey.
In other words, they created AIs which were high and low on the “big 5” OCEAN metrics, Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism and then they paired the different AIs with humans who were also rated on the big-5.
The results were quite amusing. For example, a neurotic AI tended to make a lot more copy edits unless paired with an agreeable human.
AI Alex: What do you think of this edit I made to the copy? Do you think it is any good?
Agreeable Alex: It’s great!
AI Alex: Really? Do you want me to try something else?
Agreeable Alex: Nah, let’s go with it!
AI Alex: Ok. 🙂
Similarly, if a highly conscientiousness AI and a highly conscientiousness human were paired together they exchanged a lot more messages.
It’s hard to generalize from one study to know exactly which AI-human teams will work best but we all know some teams just work better–every team needs a booster and a sceptic, for example– and the fact that we can manipulate AI personalities to match them with humans and even change the AI personalities over time suggests that AIs can improve productivity in ways going beyond the ability of the AI to complete a task.
Hat tip: John Horton.
Men watch women’s sports more than women do
Here is the link, via Alex T.
Balaji on the new image release
A few thoughts on the new ChatGPT image release.
(1) This changes filters. Instagram filters required custom code; now all you need are a few keywords like “Studio Ghibli” or Dr. Seuss or South Park.
(2) This changes online ads. Much of the workflow of ad unit generation can now be automated, as per QT below.
(3) This changes memes. The baseline quality of memes should rise, because a critical threshold of reducing prompting effort to get good results has been reached.
(4) This may change books. I’d like to see someone take a public domain book from Project Gutenberg, feed it page by page into Claude, and have it turn it into comic book panels with the new ChatGPT. Old books may become more accessible this way.
(5) This changes slides. We’re now close to the point where you can generate a few reasonable AI images for any slide deck. With the right integration, there should be less bullet-point only presentations.
(6) This changes websites. You can now generate placeholder images in a site-specific style for any <img> tag, as a kind of visual Loren Ipsum.
(7) This may change movies. We could see shot-for-shot remakes of old movies in new visual styles, with dubbing just for the artistry of it. Though these might be more interesting as clips than as full movies.
(8) This may change social networking. Once this tech is open source and/or cheap enough to widely integrate, every upload image button will have a generate image alongside it.
(9) This should change image search. A generate option will likewise pop up alongside available images.
(10) In general, visual styles have suddenly become extremely easy to copy, even easier than frontend code. Distinction will have to come in other ways.
Here is the full tweet.
Why LLMs are so good at economics
I can think of a few reasons:
At least for the time being, even very good LLMs cannot be counted on for originality. And at least for the time being, good economic reasoning does not require originality, quite the contrary.
Good chains of reasoning in economics are not too long and complicated. If they run on for very long, there is probably something wrong with the argument. The length of these effective reasoning chains is well within the abilities of the top LLMs today.
Plenty of good economics requires a synthesis of theoretical and empirical considerations. LLMs are especially good at synthesis.
In economic arguments and explanations, there are very often multiple factors. LLMs are very good at listing multiple factors, sometimes they are “too good” at it, “aargh! not another list, bitte…”
Economics journal articles are fairly high in quality and they are generally consistent with each other, being based on some common ideas such as demand curves, opportunity costs, gains from trade, and so on. Odds are that a good LLM has been trained “on the right stuff.”
A lot of core economics ideas are “hard to see from scratch,” but “easy to grasp once you see them.” This too plays to the strength of the models as strong digesters of content.
And so quality LLMs will be better at economics than many other fields of investigation.
Wednesday assorted links
1. Inaccurate beliefs about skill decay. George Loewenstein papers have been quite good historically…
2. New Dwarkesh book with Stripe Press, an oral history of AI.
3. New Alice Evans paper on the global Islamic revival.
5. 2025 color stats.
Canada and America in Better Times
On November 4, 1979, a mob of radical university students and supporters of Ayatollah Khomeini, surged over the wall and occupied the US Embassy in Tehran. Fifty two Americans were taken hostage but six evaded capture. Hiding out for days, the escapees managed to contact Canadian diplomat John Sheardown and Canadian Ambassador Ken Taylor and asked for help. The Government of Canada reports:
Taylor didn’t hesitate. The Americans would be given shelter – the question was where. Because the Canadian Chancery was right downtown, it was far too dangerous. It would be better to split up the Americans. Taylor decided Sheardown should take three of hostages to his house, while he would house the others at the official residence. They would be described to staff as tourists visiting from Canada. Taylor immediately began drafting a cable for Ottawa.
…Taylor’s telegram set off a frenzy of consultation in the Department of External Affairs….Michael Shenstone, immediately concurred that Canada had no choice but to shelter the fugitives. Under-Secretary Allan Gotlieb agreed. Given the danger the Americans were in, he noted, there was “in all conscience…no alternative but to concur” despite the risk to Canadians and Canadian property.
The Minister, Flora MacDonald, could not be immediately reached as she was involved in a television interview. However, when finally informed of the situation, she agreed that Taylor must be permitted to act…[Prime Minister Joe Clark was pulled] from Question Period in the House of Commons, she briefed him on the situation and obtained his immediate go-ahead. Soon after, a telegram was sent to Tehran – Taylor could act to save the Americans. He was told that knowledge of the situation would be on a strict “need-to-know” basis.
The CIA reports:
The exfiltration task was daunting–the six Americans had no intelligence background; planning required extensive coordination within the US and Canadian governments; and failure not only threatened the safety of the hostages but also posed considerable risk of worldwide embarrassment to the US and Canada.
…After careful consideration of numerous options, the chosen plan began to take shape. Canadian Parliament agreed to grant Canadian passports to the six Americans. The CIA team together with an experienced motion-picture consultant devised a cover story so exotic that it would not likely draw suspicions–the production of a Hollywood movie.
The team set up a dummy company, “Studio Six Productions,” with offices on the old Columbia Studio lot formerly occupied by Michael Douglas, who had just completed producing The China Syndrome. This upstart company titled its new production “Argo” after the ship that Jason and the Argonauts sailed in rescuing the Golden Fleece from the many-headed dragon holding it captive in the sacred garden–much like the situation in Iran. The script had a Middle Eastern sci-fi theme that glorified Islam. The story line was intentionally complicated and difficult to decipher. Ads proclaimed Argo to be a “cosmic conflagration” written by Teresa Harris (the alias selected for one of the six awaiting exfiltration).
President Jimmy Carter approved the rescue operation.Â
The American diplomats escaped and all the Canadians quickly exited before the Iranian government realized what had happened. The Canadian embassy was closed. The story of the ex-filtration is told in the excellent movie, Argo, directed by and starring Ben Affleck. (The movie ups the American involvement for Hollywood but is still excellent.)
The CIA reports on what happened when the Americans made it back home:
News of the escape and Canada’s role quickly broke. Americans went wild in celebrating their appreciation to Canada and its Embassy staff. The maple leaf flew in a hundred cities and towns across the US. Billboards exclaimed “Thank you Canada!” Full-page newspaper ads expressed American’s thanks to its neighbors to the north. Thirty-thousand baseball fans cheered Canada’s Ambassador to Iran and the six rescued Americans, honored guests at a game in Yankee Stadium.
I remember this time well because my father, a professor of Mechanical Engineering at the University of Toronto, happened to be giving a talk in Boston when the news broke. He was immediately mobbed by appreciative Americans, who thanked him, clapped him on the back, and bought him drinks. My father was moved by the American response but was also somewhat bemused, considering he was also Iranian. (Though, in truth, my father was the ideal Canadian and he had his own experiences exfiltrating people from Iran—but that story remains Tabarrok classified.)
Rethinking regulatory fragmentation
Regulatory fragmentation occurs when multiple federal agencies oversee a single issue. Using the full text of the Federal Register, the government’s official daily publication, we provide the first systematic evidence on the extent and costs of regulatory fragmentation. Fragmentation increases the firm’s costs while lowering its productivity, profitability, and growth. Moreover, it deters entry into an industry and increases the propensity of small firms to exit. These effects arise from redundancy and, more prominently, from inconsistencies between government agencies. Our results uncover a new source of regulatory burden, and we show that agency costs among regulators contribute to this burden.
That is from a new paper by Joseph Kalmenovitz, Michelle Lowry, and Ekaterina Volkova, forthcoming in Journal of Finance. Via the excellent Kevin Lewis.
Tuesday assorted links
1. 23 Australians sent to the hospital by red ant stings.
2. Maurice Obstfeld on the trade deficit.
3. William Vollmann update, good piece.
4. The State Capacity crisis, and a popular version here.
Jonathan Bechtel on AI tutoring (from my email)
You recently mentioned the Alpha School and their claims about AI tutoring. I share the skepticism expressed in your comments section regarding selection bias and the lack of validated academic benchmarks.
I wanted to highlight a more rigorously evaluated project called Tutor CoPilot, conducted jointly by Stanford’s NSSA and the online tutoring firm FEVTutor (sadly they’ve since gone bankrupt). To my knowledge, it’s the first and only RCT examining AI-assisted tutoring in real K-12 school districts.
Here’s the study:Â https://nssa.stanford.edu/studies/tutor-copilot-human-ai-approach-scaling-real-time-expertise
Key findings:
- Immediate session-level learning outcomes improved by 4-9%.
- Remarkably, the tool impacted tutors even more than students. After six weeks, inexperienced tutors reached performance parity with seasoned tutors, and previously low-performing tutors achieved average-level results.
Having contributed directly to the implementation, I observed tutors adapting their interactions based on insights from the AI. This study did not measure its impact on more distal measures of learning like standardized tests and benchmark assessments, but this type of research is in the works at various organizations.
Given your recent writings on AI and education, I thought you’d find this compelling.
Argentina’s DOGE
Cato has a good summary of Deregulation in Argentina:
- The end of Argentina’s extensive rent controls has resulted in a tripling of the supply of rental apartments in Buenos Aires and a 30 percent drop in price.
- The new open-skies policy and the permission for small airplane owners to provide transportation services within Argentina has led to an increase in the number of airline services and routes operating within (and to and from) the country.
- Permitting Starlink and other companies to provide satellite internet services has given connectivity to large swaths of Argentina that had no such connection previously. Anecdotal evidence from a town in the remote northwestern province of Jujuy implies a 90 percent drop in the price of connectivity.
- The government repealed the “Buy Argentina” law similar to “Buy American” laws, and it repealed laws that required stores to stock their shelves according to specific rules governing which products, by which companies and which nationalities, could be displayed in which order and in which proportions.
- Over-the-counter medicines can now be sold not just by pharmacies but by other businesses as well. This has resulted in online sales and price drops.
- The elimination of an import-licensing scheme has led to a 20 percent drop in the price of clothing items and a 35 percent drop in the price of home appliances.
- The government ended the requirement that public employees purchase flights on the more expensive state airline and that other airlines cannot park their airplanes overnight at one of the main airports in Buenos Aires.
- In January, Sturzenegger announced a “revolutionary deregulation” of the export and import of food. All food that has been certified by countries with high sanitary standards can now be imported without further approval from, or registration with, the Argentine state. Food exports must now comply only with the regulations of the destination country and are unencumbered by domestic regulations.
Needless to say, America’s DOGE could learn something from Argentina:
Milei’s task of turning Argentina once again into one of the freest and most prosperous countries in the world is herculean. But deregulation plays a key role in achieving that goal, and despite the reform agenda being far from complete, Milei has already exceeded most people’s expectations. His deregulations are cutting costs, increasing economic freedom, reducing opportunities for corruption, stimulating growth, and helping to overturn a failed and corrupt political system. Because of the scope, method, and extent of its deregulations, Argentina is setting an example for an overregulated world.
Congratulations to the Marginal Revolution University team!
This Valentine’s Day was a memorable one as MRU finally hit its nationwide goal of having MRU products adopted by 25% of U.S. high schools by the year 2025. (See our final MRU Adoption & Market Share map for 2-14-25.)
Great work, bravo!
China fact of the day
Buried in China’s latest government budget were some numbers that add up to an alarming trend. Tax revenue is dropping.
The decline means that China’s national government has less money to address the country’s serious economic challenges, including a housing market crash and the near bankruptcy of hundreds of local governments…
Tax revenue fell further last year than ever before…Overall tax revenue fell 3.4 percent last year.
…Fitch Ratings calculates that overall revenue for the national and local governments — including taxes and land sales — totaled 29 percent of the economy’s output as recently as 2018. But this year’s budget indicates that overall revenue will be just 21.1 percent of the economy in 2025.
Roughly half of the decline comes from plummeting revenue from land sales, a well-documented problem related to the housing-market crash, but the rest comes from weakness in tax revenue, a new problem.
That adds up to a huge sum of money. If overall revenue had kept up with the economy over the past seven years, the Chinese government would have another $1.5 trillion to spend in 2025.
China announced this month that it would allow its official target for the budget deficit to increase to 4 percent this year, after trying to keep it near 3 percent ever since the global financial crisis in 2009. But analysts say the true deficit is already much larger, because China is quietly counting a lot of long-term borrowing as though it were tax revenue.
Comparing spending only with actual revenue, without the borrowing, the Finance Ministry’s budget shows a deficit equal to almost 9 percent of the economy. In 2018, it was only 3.2 percent…
Income taxes collected from individuals were 7.5 percent below expectations last year, the Finance Ministry said in its budget.
Good thing they still are growing at five percent! Here is more from Keith Bradsher at the NYT.
Monday assorted links
1. Is YouTube why Herbie Hancock has not made an album in so long?
2. Often you need only one person with the AI. And the demand for translators is falling.
4. Speculative claims about AI tutoring.
5. The peso in Argentina is now quite overvalued, and this is a dangerous situation. And more from the FT.
What should I ask Ken Rogoff?
Yes I will be doing a Conversation with him. He has a new book coming out, namelyÂ
Ken is tenured at Harvard, here is his Wikipedia page, here is Ken on scholar.google.com, here is o1 pro on Rogoff, and he also holds the title of chess grandmaster.
What Follows from Lab Leak?
Does it matter whether SARS-CoV-2 leaked from a lab in Wuhan or had natural zoonotic origins? I think on the margin it does matter.
First, and most importantly, the higher the probability that SARS-CoV-2 leaked from a lab the higher the probability we should expect another pandemic.* Research at Wuhan was not especially unusual or high-tech. Modifying viruses such as coronaviruses (e.g., inserting spike proteins, adapting receptor-binding domains) is common practice in virology research and gain-of-function experiments with viruses have been widely conducted. Thus, manufacturing a virus capable of killing ~20 million human beings or more is well within the capability of say ~500-1000 labs worldwide. The number of such labs is growing in number and such research is becoming less costly and easier to conduct. Thus, lab-leak means the risks are larger than we thought and increasing.
A higher probability of a pandemic raises the value of many ideas that I and others have discussed such as worldwide wastewater surveillance, developing vaccine libraries and keeping vaccine production lines warm so that we could be ready to go with a new vaccine within 100 days. I want to focus, however, on what new ideas are suggested by lab-leak. Among these are the following.
Given the risks, a “Biological IAEA” with similar authority as the International Atomic Energy Agency to conduct unannounced inspections at high-containment labs does not seem outlandish. (Indeed the Bulletin of Atomic Scientists are about the only people to have begun to study the issue of pandemic lab risk.) Under the Biological Weapons Convention such authority already exists but it has never been used for inspections–mostly because of opposition by the United States–and because the meaning of biological weapon is unclear, as pretty much everything can be considered dual use. Notice, however, that nuclear weapons have killed ~200,000 people while accidental lab leak has probably killed tens of millions of people. (And COVID is not the only example of deadly lab leak.) Thus, we should consider revising the Biological Weapons Convention to something like a Biological Dangers Convention.
BSL3 and especially BSL4 safety procedures are very rigorous, thus the issue is not primarily that we need more regulation of these labs but rather to make sure that high-risk research isn’t conducted under weaker conditions. Gain of function research of viruses with pandemic potential (e.g. those with potential aerosol transmissibility) should be considered high-risk and only conducted when it passes a review and is done under BSL3 or BSL4 conditions. Making this credible may not be that difficult because most scientists want to publish. Thus, journals should require documentation of biosafety practices as part of manuscript submission and no journal should publish research that was done under inappropriate conditions. A coordinated approach among major journals (e.g., Nature, Science, Cell, Lancet) and funders (e.g. NIH, Wellcome Trust) can make this credible.
I’m more regulation-averse than most, and tradeoffs exist, but COVID-19’s global economic cost—estimated in the tens of trillions—so vastly outweighs the comparatively minor cost of upgrading global BSL-2 labs and improving monitoring that there is clear room for making everyone safer without compromising research. Incredibly, five years after the crisis and there has be no change in biosafety regulation, none. That seems crazy.
Many people convinced of lab leak instinctively gravitate toward blame and reparations, which is understandable but not necessarily productive. Blame provokes defensiveness, leading individuals and institutions to obscure evidence and reject accountability. Anesthesiologists and physicians have leaned towards a less-punitive, systems-oriented approach. Instead of assigning blame, they focus in Morbidity and Mortality Conferences on openly analyzing mistakes, sharing knowledge, and redesigning procedures to prevent future harm. This method encourages candid reporting and learning. At its best a systems approach transforms mistakes into opportunities for widespread improvement.
If we can move research up from BSL2 to BSL3 and BSL4 labs we can also do relatively simple things to decrease the risks coming from those labs. For example, let’s not put BSL4 labs in major population centers or in the middle of a hurricane prone regions. We can also, for example, investigate which biosafety procedures are most effective and increase research into safer alternatives—such as surrogate or simulation systems—to reduce reliance on replication-competent pathogens.
The good news is that improving biosafety is highly tractable. The number of labs, researchers, and institutions involved is relatively small, making targeted reforms feasible. Both the United States and China were deeply involved in research at the Wuhan Institute of Virology, suggesting at least the possibility of cooperation—however remote it may seem right now.
Shared risk could be the basis for shared responsibility.
Bayesian addendum *: A higher probability of a lab-leak should also reduce the probability of zoonotic origin but the latter is an already known risk and COVID doesn’t add much to our prior while the former is new and so the net probability is positive. In other words, the discovery of a relatively new source of risk increases our estimate of total risk.