Category: Science

Noam Brown reports

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools.

Here is the link, here is some commentary.  And Nat McAleese.  And the prediction marketEmad: “Two years ago, who would have said an IMO gold medal & topping benchmarks isn’t AGI?”  And it is good at other things too.

And from Alexander Wei: “Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.”

David Brooks on the AI race

When it comes to confidence, some nations have it and some don’t. Some nations once had it but then lost it. Last week on his blog, “Marginal Revolution,” Alex Tabarrok, a George Mason economist, asked us to compare America’s behavior during Cold War I (against the Soviet Union) with America’s behavior during Cold War II (against China). I look at that difference and I see a stark contrast — between a nation back in the 1950s that possessed an assumed self-confidence versus a nation today that is even more powerful but has had its easy self-confidence stripped away.

There is much more at the NYT link.

The Sputnik vs. Deep Seek Moment: The Answers

In The Sputnik vs. DeepSeek Moment I pointed out that the US response to Sputnik was fierce competition. Following Sputnik, we increased funding for education, especially math, science and foreign languages, organizations like ARPA were spun up, federal funding for R&D was increased, immigration rules were loosened, foreign talent was attracted and tariff barriers continued to fall. In contrast, the response to what I called the “DeepSeek” moment has been nearly the opposite. Why did Sputnik spark investment while DeepSeek sparks retrenchment? I examine four explanations from the comments and argue that the rise of zero-sum thinking best fits the data.

Several comments fixated on DeepSeek itself, dismissing it as neither impressive nor threatening. Perhaps but DeepSeek was merely a symbol for China’s broader rise: the world’s largest exporter, manufacturer, electricity producer, and military by headcount. These critiques missed the point.

Some commenters argued that Sputnik provoked a strong response because it was seen as an existential threat, while DeepSeek—and by extension China—is not. I certainly hope China’s rise isn’t existential, and I’m encouraged that China lacks the Soviet Union’s revolutionary zeal. As I’ve said, a richer China offers benefits to the United States.

But many influential voices do view China as a very serious, even existential, threat—and unlike the USSR, China is economically formidable.

More to the point, perceived existential stakes don’t answer my question. If the threat were greater, would we suddenly liberalize immigration, expand trade, and fund universities? Unlikely. A more plausible scenario is that if the threat were greater, we would restrict harder—more tariffs, less immigration, more internal conflict.

Several commenters, including my colleague Garett Jones, pointed to demographics—especially voter demographics. The median age has risen from 30 in 1950 to 39 in recent years; today’s older, wealthier, more diverse electorate may be more risk-averse and inward-looking. There’s something to this, but it’s not sufficient. Changes in the X variables haven’t been enough to explain the change in response given constant Betas so demography doesn’t push that far but does it even push in the right direction?

Age might correlate with risk-aversion, for example, but the Trump coalition isn’t risk-averse—it’s angry and disruptive, pushing through bold and often rash policy changes.

A related explanation is that the U.S. state has far less fiscal and political slack today than it did in 1957. As I argued in Launching, we’ve become a warfare–welfare state—possibly at the expense of being an innovation state. Fiscal constraints are real, but the deeper issue is changing preferences. It’s not that we want to return to the moon and can’t—it’s that we’ve stopped wanting to go.

In my view, the best explanation for the starkly different responses to the Sputnik and DeepSeek moments is the rise of zero-sum thinking—the belief that one group’s gain must come at another’s expense. Chinoy, Nunn, Sequiera and Stantcheva show that the zero sum mindset has grown markedly in the U.S. and maps directly onto key policy attitudes.

Zero sum thinking fuels support for trade protection: if other countries gain, we must be losing. It drives opposition to immigration: if immigrants benefit, natives must suffer. And it even helps explain hostility toward universities and the desire to cut science funding. For the zero-sum thinker, there’s no such thing as a public good or even a shared national interest—only “us” versus “them.” In this framework, funding top universities isn’t investing in cancer research; it’s enriching elites at everyone else’s expense. Any claim to broader benefit is seen as a smokescreen for redistributing status, power, and money to “them.”

Zero-sum thinking doesn’t just explain the response to China; it’s also amplified by the China threat. (hence in direct opposition to some of the above theories, the people who most push the idea that the China threat is existential are the ones who are most pushing the zero sum response). Davidai and Tepper summarize:

People often exhibit zero-sum beliefs when they feel threatened, such as when they think that their (or their group’s) resources are at risk…Similarly, working under assertive leaders (versus approachable and likeable leaders) causally increases domain-specific zero-sum beliefs about success….. General zero-sum beliefs are more prevalent among people who see social interactions as a competition and among people who possess personality traits associated with high threat susceptibility, such as low agreeableness and high psychopathy, narcissism and Machiavellianism.

Zero-sum thinking can also explain the anger we see in the United States:

At the intrapersonal level, greater endorsement of general zero-sum beliefs is associated with more negative (and less positive) affect, more greed and lower life satisfaction. In addition, people with general zero-sum beliefs tend to be overly cynical, see society as unjust, distrust their fellow citizens and societal institutions, espouse more populist attitudes, and disengage from potentially beneficial interactions.

…Together, these findings suggest a clear association between both types of zero-sum belief and well-being.

Focusing on zero-sum thinking gives us a different perspective on some of the demographic issues. In the United States, for example, the young are more zero-sum thinkers than the old and immigrants tend to be less zero-sum thinkers than natives. The likeliest reason: those who’ve experienced growth understand that everyone can get a larger slice from a growing pie while those who have experienced stagnation conclude that it’s us or them.

The looming danger is thus the zero-sum trap: the more people believe that wealth, status, and well-being are zero-sum, the more they back policies that make the world zero-sum. Restricting trade, blocking immigration, and slashing science funding don’t grow the pie. Zero-sum thinking leads to zero-sum policies, which produce zero-sum outcomes—making the zero sum worldview a self-fulfilling prophecy.

Someday I want to see the regressions

Each infant born from the procedure carries DNA from a man and two women. It involves transferring the nucleus from the fertilised egg of a woman carrying harmful mitochondrial mutations into a donated egg from which the nucleus has been removed.

For some carriers this is the only option because conventional IVF does not produce enough healthy embryos to use after pre-implantation diagnosis.

The researchers consistently reject the popular term “three-parent babies”, said Turnbull, “but it doesn’t make a scrap of difference.”

Here is more from Clive Cookson at the FT.  From Newcastle.  And here is some BBC coverage.

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

This paper presents a general framework for conducting efficient inference on parameters derived from unstructured data, which include text, images, audio, and video. Economists have long used unstructured data by first extracting low-dimensional structured features (e.g., the topic or sentiment of a text), since the raw data are too high-dimensional and uninterpretable to include directly in empirical analyses. The rise of deep neural networks has accelerated this practice by greatly reducing the costs of extracting structured data at scale, but neural networks do not make generically unbiased predictions. This potentially propagates bias to the downstream estimators that incorporate imputed structured data, and the availability of different off-the-shelf neural networks with different biases moreover raises p-hacking concerns. To address these challenges, we reframe inference with unstructured data as a problem of missing structured data, where structured variables are imputed from high-dimensional unstructured inputs. This perspective allows us to apply classic results from semiparametric inference, leading to estimators that are valid, efficient, and robust. We formalize this approach with MAR-S, a framework that unifies and extends existing methods for debiased inference using machine learning predictions, connecting them to familiar problems such as causal inference. Within this framework, we develop robust and efficient estimators for both descriptive and causal estimands and address challenges like inference with aggregated and transformed missing structured data-a common scenario that is not covered by existing work. These methods-and the accompanying implementation package-provide economists with accessible tools for constructing unbiased estimators using unstructured data in a wide range of applications, as we demonstrate by re-analyzing several influential studies.

That is from a recent paper by Jacob Carlson and Melissa Dell.  Via Kevin Bryan.

Hayek Goes Supersonic

When I post about lifting the ban on supersonic flight, smart commenters show up with charts: optimal fuel burn is at Mach 0.78–0.84, they say, or no one wants to pay thousands to save a few hours. Maybe. But my reply is always the same: Bottled water!

In 2024, Americans spent $47 billion a year on H₂O that they could get for nearly free. That still boggles my mind—but bottled water has passed the market test. I argue for lifting the SST ban, and similar policies, not because we know supersonics will work but because we don’t. Hayek reminds us that competition is a discovery procedure. Like science, markets generate knowledge by experiment—hypotheses are posted as prices, and the public accepts or rejects them through revealed preference. Fred Smith’s FedEx plan got a “C” in the classroom, but the market graded the experiment and returned an A in equity. Theory is great, but just as in science, there is no substitute for running the experiment.

Supersonics Takeoff!

In Lift the Ban on Supersonics I wrote:

Civilian supersonic aircraft have been banned in the United States for over 50 years! In case that wasn’t clear, we didn’t ban noisy aircraft we banned supersonic aircraft. Thus, even quiet supersonic aircraft are banned today. This was a serious mistake. Aside from the fact that the noise was exaggerated, technological development is endogenous.

If you ban supersonic aircraft, the money, experience and learning by doing needed to develop quieter supersonic aircraft won’t exist. A ban will make technological developments in the industry much slower and dependent upon exogeneous progress in other industries.

When we ban a new technology we have to think not just about the costs and benefits of a ban today but about the costs and benefits on the entire glide path of the technology

In short, we must build to build better. We stopped building and so it has taken more than 50 years to get better. Not learning, by not doing.

… I’d like to see the new administration move forthwith to lift the ban on supersonic aircraft. We have been moving too slow.

Thus, I am pleased to note that President Trump has issued an executive order to lift the ban on supersonics!

The United States stands at the threshold of a bold new chapter in aerospace innovation.  For more than 50 years, outdated and overly restrictive regulations have grounded the promise of supersonic flight over land, stifling American ingenuity, weakening our global competitiveness, and ceding leadership to foreign adversaries.  Advances in aerospace engineering, materials science, and noise reduction now make supersonic flight not just possible, but safe, sustainable, and commercially viable.  This order begins a historic national effort to reestablish the United States as the undisputed leader in high-speed aviation.  By updating obsolete standards and embracing the technologies of today and tomorrow, we will empower our engineers, entrepreneurs, and visionaries to deliver the next generation of air travel, which will be faster, quieter, safer, and more efficient than ever before.

…The Administrator of the Federal Aviation Administration (FAA) shall take the necessary steps, including through rulemaking, to repeal the prohibition on overland supersonic flight in 14 CFR 91.817 within 180 days of the date of this order and establish an interim noise-based certification standard, making any modifications to 14 CFR 91.818 as necessary, as consistent with applicable law.  The Administrator of the FAA shall also take immediate steps to repeal 14 CFR 91.819 and 91.821, which will remove additional regulatory barriers that hinder the advancement of supersonic aviation technology in the United States.

Congratulations to Eli Dourado who has been pushing this issue for more than a decade.

Ideological Reversals Amongst Economists

Research in economics often carries direct political implications, with findings supporting either right-wing or left-wing perspectives. But what happens when a researcher known for publishing right-wing findings publishes a paper with left-wing findings (or vice versa)? We refer to these instances as ideological reversals. This study explores whether such researchers face penalties – such as losing their existing audience without attracting a new one – or if they are rewarded with a broader audience and increased citations. The answers to these questions are crucial for understanding whether academia promotes the advancement of knowledge or the reinforcement of echo chambers. In order to identify ideological reversals, we begin by categorizing papers included in meta-analyses of key literatures in economics as “right” or “left” based on their findings relative to other papers in their literature (e.g., the presence or absence of disemployment effects in the minimum wage literature). We then scrape the abstracts (and other metadata) of every economics paper ever published, and we deploy machine learning in order to categorize the ideological implications of these papers. We find that reversals are associated with gaining a broader audience and more citations. This result is robust to a variety of checks, including restricting analysis to the citation trajectory of papers already published before an author’s reversal. Most optimistically, authors who have left-to-right (right-to-left) reversals not only attract a new rightwing (left-wing) audience for their recent work, this new audience also engages with and cites the author’s previous left-wing (right-wing) papers, thereby helping to break down echo chambers.

That is from a new paper by Matt Knepper and Brian Wheaton, via Kris Gulati.  If it is audience-expanding for researchers to write such papers, does that mean we should trust their results less?

My Conversation with the excellent John Arnold

Here is the audio, video, and transcript.  Here is part of the episode summary:

Tyler and John discuss his shift from trading to philanthropy and more, including the specific traits that separate great traders from good ones, the tradeoffs of following an “inch wide, mile deep” trading philosophy, why he attended Vanderbilt, the talent culture at Enron, the growth in solar, the problem with Mexico’s energy system, where Canada’s energy exports will go, the hurdles to next-gen nuclear, how to fix America’s tripartite energy grid, how we’ll power new data centers, what’s best about living in Houston, his approach to collecting art, why trading’s easier than philanthropy, how he’d fix tax the US tax code and primary system, and what Arnold Ventures is focusing on next.

Excerpt:

COWEN: Say there’s a major volcanic event, and there’s a lot of ash in the sky for two or three years. Solar needs a backup. In the meantime, before the volcanic event happens — and of course, that’s quite rare — how much do we need to be up and running with the backup energy infrastructure? What do we need for reserve capacity in case the solar goes down?

ARNOLD: Good question. It would be difficult. It’s doable today. I think as solar continues to grow in market share, both in the US and globally, it will have to be met with some type of battery, a significant battery resource. That’s part of the economics of solar now, that it’s not just sticking it right outside of Phoenix, but it is solar plus transmission or solar plus battery. The question of what happens in that type of event — it would be difficult. The existing energy infrastructure is still largely around.

COWEN: But it will dwindle over time, right?

ARNOLD: It will dwindle over time.

COWEN: Is there some market issue? Say the volcanic event is only once every 150 years, but sooner or later, one happens. In the meantime, you need economic incentives for the gas or the nuclear to be ready. Does our government just keep on paying for those for 149 years in a row until the catastrophe comes?

ARNOLD: It’s a great question, and I think this is why nuclear, and particularly next-gen nuclear, is considered the holy grail, right? You’re not constrained by location. You’re not constrained by, is the wind blowing, is the sun shining? And it’s a clean resource. The problem today is just economics. In order to develop the current generation of nuclear, it’s extraordinarily expensive. Next generation — either small modular fission or fusion — both have a number of technological as well as unclear economics in how they compete.

I do think this question of how do you do this transition in a manner that maintains affordability but continues to get cleaner and lower emissions over time is a complex one, and I think it’s one that the environmentalists probably oversold five years ago in saying that this was going to be an easy transition. It’s certainly not. Just the scale and scope of the energy system is enormous, as you’re pointing to in your question. The need for backup, the need for a diversity of fuels, and how they complement each other is real, and you can’t replace that just with the intermittent resources we have today, plus battery.

And:

COWEN: What’s your most optimistic scenario for the US energy future from an environmental point of view, something that could plausibly happen?

ARNOLD: I think next-gen nuclear, if we can overcome the technical hurdles, if we can overcome the economic hurdles.

COWEN: But isn’t NIMBYism the biggest hurdle? The others I could imagine overcoming pretty readily, but I live in Fairfax County, which builds a fair amount. People there just don’t want nuclear. It’s irrational, but I’m not sure they’ll change their minds. It could be called fusion; it’s still nuclear to them.

ARNOLD: Yes, I’ve been surprised. That was my prior five years ago. I’ve been surprised at the number of jurisdictions that are inviting these next-gen nuclear companies to come. Texas, for instance, just passed a bill creating new incentives for nuclear companies to come and build their first plants and pilot projects in Texas. You see jurisdictions that are choosing to take the economic growth associated with it and that have more of a building culture and say, “Come here.”

I think, as things get proven out, then the question is, will the Fairfax counties of the world see what’s going on and become more agreeable to having that? I think it’s very similar to self-driving cars.

There’re some jurisdictions that say, “Come here. We want you to come, test,” and this is what’s happening in Texas. These companies say, “We want you to come pilot your projects here.” And some jurisdictions are saying, “No, prove it out, and then we’ll talk.”

COWEN: My nightmare is that even Texas becomes NIMBY. You see this in Austin already. Houston, Dallas will become more like the rest of America over time, maybe even San Antonio someday, El Paso with more time.

Interesting throughout, recommended.  We also talk about art and art collecting…

Economics coauthorships in the aftermath of MeToo

We study changes in coauthorships in economics, after the MeToo movement, using NBER and CEPR working papers between January 2004 and December 2020. We identify three main shifts in collaboration patterns. First, compared to pre-MeToo levels, collaborations across genders in an author’s seniority group increased: we estimate a 12.3% increase of women coauthors per 100 men-authored papers. Second, coauthorship shares of senior with junior economics declined by 3.0%, indicating a shift towards sorting of collaborations by seniority. Third, shares of new coauthorships declined by 5.4%, driven by drops in senior economists’ shares of new junior and new junior women by 18.4% and 48.0%, respectively. The results are robust to different specifications.

That is from a new paper by Noriko Amano-Patiño, Elisa Faraglia, and Chryssi Giannitsarou.  Via the excellent Kevin Lewis.  And here is a related paper on who receives credit for cross-gender co-authorships.

My excellent Conversation with Theodore Schwartz

Here is the audio, video, and transcript.  Here is part of the episode summary:

Tyler and Ted discuss how the training for a neurosurgeon could be shortened, the institutional factors preventing AI from helping more in neurosurgery, how to pick a good neurosurgeon, the physical and mental demands of the job, why so few women are currently in the field, whether the brain presents the ultimate bottleneck to radical life extension, why he thinks free will is an illusion, the success of deep brain stimulation as a treatment for neurological conditions,  the promise of brain-computer interfaces, what studying epilepsy taught him about human behavior, the biggest bottleneck limiting progress in brain surgery, why he thinks Lee Harvey Oswald acted alone, the Ted Schwartz production function, the new company he’s starting, and much more.

And an excerpt:

COWEN: I know what economists are like, so I’d be very worried, no matter what my algorithm was for selecting someone. Say the people who’ve only been doing operations for three years — should there be a governmental warning label on them the way we put one on cigarettes: “dangerous for your health”? If so, how is it they ever learn?

SCHWARTZ: You raise a great point. I’ve thought about this. I talk about this quite a bit. The general public — when they come to see me, for example, I’m at a training hospital, and I practiced most of my career where I was training residents. They’ll come in to see me, and they’ll say, “I want to make sure that you’re doing my operation. I want to make sure that you’re not letting a resident do the operation.” We’ll have that conversation, and I’ll tell them that I’m doing their operation, but that I oversee residents, and I have assistants in the operating room.

But at the same time that they don’t want the resident touching them, in training, we are obliged to produce neurosurgeons who graduate from the residency capable of doing neurosurgery. They want neurosurgeons to graduate fully competent because on day one, you’re out there taking care of people, but yet they don’t want those trainees touching them when they’re training. That’s obviously an impossible task, to not allow a trainee to do anything, and yet the day they graduate, they’re fully competent to practice on their own.

That’s one of the difficulties involved in training someone to do neurosurgery, where we really don’t have good practice facilities where we can have them practice on cadavers — they’re really not the same. Or have models that they can use — they’re really not the same, or simulations just are not quite as good. At this point, we don’t label physicians as early in their training.

I think if you do a little bit of research when you see your surgeon, there’s a CV there. It’ll say, this is when he graduated, or she graduated from medical school. You can do the calculation on your own and say, “Wow, they just graduated from their training two years ago. Maybe I want someone who has five years under their belt or ten years under their belt.” It’s not that hard to find that information.

COWEN: How do you manage all the standing?

And:

COWEN: Putting yourself aside, do you think you’re a happy group of people overall? How would you assess that?

SCHWARTZ: I think we’re as happy as our last operation went, honestly. Yes, if you go to a neurosurgery meeting, people have smiles on their faces, and they’re going out and shaking hands and telling funny stories and enjoying each other’s company. It is a way that we deal with the enormous pressure that we face.

Not all surgeons are happy-go-lucky. Some are very cold and mechanical in their personalities, and that can be an advantage, to be emotionally isolated from what you’re doing so that you can perform at a high level and not think about the significance of what you’re doing, but just think about the task that you’re doing.

On the whole, yes, we’re happy, but the minute you have a complication or a problem, you become very unhappy, and it weighs on you tremendously. It’s something that we deal with and think about all the time. The complications we have, the patients that we’ve unfortunately hurt and not helped — although they’re few and far between, if you’re a busy neurosurgeon doing complex neurosurgery, that will happen one or two times a year, and you carry those patients with you constantly.

Fun and interesting throughout, definitely recommended.  And I will again recommend Schwartz’s book Gray Matters: A Biography of Brain Surgery.

Sentences to ponder

In fact, it was the Obama administration that paused funding for high-risk GoF studies in 2014. The ban was lifted by none other than Donald Trump in 2017. At the time, outlets like Scientific American and Science covered the decision, in articles that quoted scientists talking about what could go wrong. Remind yourself of this the next time you see rightists trumpeting some headline showing the media being wrong about something.

That is from Richard Hanania’s Substack.