Category: Web/Tech

How research in math will change (from my email)

From GA:

I am a mathematician…and some of your recent comments on MR about the role of AI in Econ research as well as the (disappearing?) role of academic papers inspired this response. (It is partially but not exclusively about academia, so I hope it is ok that I’m sending it to your GMU address. Also, I’m hoping it doesn’t get flagged as spam because of the ai in the title…)

In no particular order:

-In the course of a math career, one accumulates lots of computational guesses, now one can test those with minimal effort.

-One also accumulates lots of incomplete and half formed drafts, proofs of special cases, etc, etc. Running those past claude and chatgpt can (does!) pay off. A lot of math is cleverly applying linear algebra and while I’m very good at linear algebra, I’m not as good at it as the AI’s are.

-The lower hanging fruit here are slightly off the beaten track, but not esoteric subjects. If you have a good overview of such, you can pretty quickly prod ai’s into making progress on them. (Before, you needed to have a school of grad students for that). Basic techniques (graph theory, algebra, calculus..) that ai’s are already good at can push these forward already. Making progress on truly hot topics is harder.

-There are some quite smart people trying to measure just how good autonomous ai’s are at math (e.g. the first batch project). That’s a fun game, but for practical purposes right now, what is relevant is how good an ai is when guided by a motivated human. I suspect we’ll see some remarkable things on that front in the next few years once math people really grok the good routines.

-For instance, getting claude and chatgpt to referee each other’s arguments is fun, and they genuinely have different insights on parts of the same problem.

-The kids will be all right. Right now, they are making pocket change doing ai training developed a better “feel” for the different ai’s that I probably ever will. And they learn things by asking the ai to explain an argument to them instead of trying to decipher a math book or paper.

-Which brings me to your papers point. I notice that a project informed with the right context is much more informative to me than the physical pdf of a math paper, and much easier to extract information out of by just asking the thing.

-Refereeing will look very different very soon. All the referee reports that have been collected by the journals should be valuable, hard to get data. And running all the accepted and published math papers through ai’s as the `control’ will end with quite a few people having egg on their face. It’s like self-driving cars, but there is no refereeing union.

-The last really big math revolution was all the stuff in the wake of Witten and 4-manifold stuff predicted by string theory in the early 90’s. This is going to be so much bigger than that. Buckle up.

AI-Native Firms

Very important work from Hyunjin Kim and Rembrand Koning. Insead and HBS respectively:

We study how firms built around AI capabilities-“AI-native” firms-are organized. Drawing on Y Combinator batches W20-F24 and U.S. venture-backed startups whose first financing closed between 2020 and 2024, we classify each firm’s AI-native status and link it to workforce microdata on team size, function, seniority, and hierarchy. Relative to non-AI startups in the same industry-cohort, AI-native firms are 25% smaller. Their share of engineers is 13% greater, and the shares of entry-level workers and managers are each roughly 15% lower. Their hierarchies are half a seniority level flatter-yet valuations are comparable, implying more value created per employee. We argue these patterns reflect two channels: a process channel, in which AI changes how people work inside the firm, and a product channel, in which AI capabilities are built into what the firm sells. Using text from product descriptions and job postings, we find that embedding AI into the product, beyond layering on AI tools into existing workflows, is a primary way startups are scaling “knowledge work” without large teams of knowledge workers.

The tweet storm on the new paper is especially useful.  Via Luis Garicano.  And note those results predate the very latest and best tools.

My Conversation with Dave Baszucki

Dave is CEO and co-founder of Roblox, and here is the audio, video, and transcript.  From the episode summary:

With over 100 million daily active users and projected revenue bookings of $7 billion this year, it is one of the largest gaming economies in the world—and one that has made millionaires out of teenage developers in Argentina, South Korea, and everywhere in between.

Tyler and Dave explore why Roblox decided early against prioritizing advertising revenue, why Dave thinks the main competition of Roblox is its own execution speed rather than Fortnite, whether every mega platform inevitably becomes an everything app, how falling token costs will change the platform, why he insists all the games on Roblox are beautiful, whether Robux should have a floating exchange rate, why admitting you have kids under 13 on your platform turns out to be a competitive advantage, why he’s skeptical of blanket social media bans, what his son’s experience with bipolar disorder taught him about metabolic health, his two-year sabbatical between companies that involved a motorhome trip across North America and a stint hosting talk radio in Santa Cruz, why Mutiny on the Bounty remains one of his favorite books, what he’ll learn next, and much more.

Excerpt:

COWEN: What percentage of your games now do you feel are beautiful?

BASZUCKI: All of them.

COWEN: Some look just quite ordinary. They might be fun, but I wouldn’t say they’re beautiful, right?

BASZUCKI: Well, I was trying to go a couple levels out of the box on you there. The reason I feel they’re beautiful is when you said that, I immediately went to look and feel, but then I tried to imagine the 12-year-old or the 18-year-old or the 30-year-old struggling to build something wonderful and the human connection to those games. By that definition, I think they’re all beautiful. They are all the efforts of creation of real people trying to pour their hearts out to make something that other people love to play.

On an artistic basis, I think you could ask me what percent of paintings in the MoMA do I think are beautiful. I’d probably say 20 percent. If I had to look at 1,000 Roblox games, I wouldn’t name which is more beautiful to me because I think that’s less important than really the heartfelt work of all the creators.

COWEN: I’ve been struck when I look at gaming at how much people don’t seem to care much about the visual beauty of their games. I would have expected something different, say, 15 years ago, and they just want a game that engages them somehow. Normal standards of visual beauty seem to have fallen away. Is that incorrect? Would you correct that impression in some manner?

BASZUCKI: I think you’re absolutely correct. What I feel you may actually be describing, if we looked into other disciplines, the evolution of story from the campfire to written to audio to a movie, and the increasing fidelity; all of those stories, in a way, are beautiful, but at the time, for the vast majority of the creators, it may be that writing is just easier than producing a 4K Hollywood movie. I feel that’s a little bit like the metaphor you’re talking about right now in gaming.

For the vast majority of people, their story or their idea for their game is actually pretty beautiful. Whether it’s a fashion game like Dress to Impress or it’s a grow garden game, the games are arguably beautiful, even if they don’t look photorealistic. What I think we’ll see is, over time, as AI helps accelerate the ability to make games look really polished in any style the creator wants—could be photorealistic, could be anime, could be a Warner Brothers 2D cartoon look—you and I might say that looks more beautiful, but the core gameplay is still somewhat the original gameplay. I think we are going to see games arguably look more beautiful, even though I think they’re all beautiful.

The dialogue is a bit slow to get underway, but there are many interesting parts.

Do teens regret their social media use?

new study by Irish researcher Eoin Whelan attempts to answer this. Dr. Whelan told me he was specifically inspired by Haidt’s 2024 claims and sought to examine them rigorously and in the context of other regrets. This is a great use of science…testing dramatic public claims. So…do they hold up?

In Dr. Whelan’s study, 389 young adult participants (20-24) who were social media users as teens were asked about their regrets regarding their teenage years. A list of 20 possible teenage regrets was asked of all participants, with degree of regret marked on a 7-point Likert scale. This is an interesting design…testing social media regrets against other possible regrets, putting them in better context than the crude survey Haidt relied on.

So how did social media regrets hold up? Out of 20 possible regrets, too much time on social media ranked 13th. The top regrets were 1.) not sticking up for oneself, 2.) being too self-conscious, 3.) not documenting memories, 4.) not learning practical life skills and 5.) not getting help with mental health. Girls were slightly more likely to regret time on social media than boys (ranking 11th vs 13th) though this effect was very small (I estimated it at about = .11) so hardly the big “vulnerable girls” narrative some have peddled.

Further, regrets over time spent on social media as a teen did not predict current young adult life satisfaction for either boys or girls. Thus such regrets may be more a symptom of current panics over social media than anything of actual life importance2. Of the regrets, only not working harder in school and not exercising negatively predicted young adult life satisfaction. Interestingly, having regrets over socializing with friends positively predicted life satisfaction.

As Dr. Whelan noted in his study, “The objective of this study was to critically examine the commonly held belief that social media use during teenage years is a significant source of regret and a predictor of diminished well-being in early adulthood…Contrary to dominant narratives in the public domain, our results suggest that regrets over time spent on social media are not among the most potent regrets reported by young adults…As such, these results align with prior research indicating that the harmful effects of social media may be overstated.”

Here is the full Chris Ferguson Substack.

Can Online Activity Be Regulated? Evidence from Adult Websites

The consequences of online regulations depend on the extent to which users can circumvent restrictions or substitute toward noncompliant platforms. Since 2023, 25 U.S. states have implemented age verification laws that caused prominent adult websites (including Pornhub) to restrict local access for all users. We study how these restrictions affected browsing activity using individual-level panel data. Access restrictions reduced overall time spent on adult sites by roughly 10%. Specifically, for every 100 hours spent on top adult sites before restrictions, about 50 hours remained accessible at noncompliant sites that never restricted access, 30 hours persisted through VPN-based circumvention, 10 hours were substituted from compliant sites to noncompliant sites, and 10 hours were no longer spent on adult sites.

That is from a new NBER working paper by Matthew Brown, Emily J. Davis, and Devin G. Pope.

AI nationalism, Europe included

Most of my Free Press column deals with Mythos, but here are some remarks on Europe:

There is yet another huge problem behind all these first-order problems. Let us say, for instance, that France’s Mistral AI develops very nicely and serves as an EU counterpart of Anthropic and OpenAI. Well, then the other European countries will become highly dependent on the French. That may seem okay today, but it will be much less fun for the Germans if the French really do have all that extra power and leverage.

As for the French themselves, they would be highly dependent on a private company. France may end up with one such company, but it is unlikely to have three of them. So Mistral will in turn have high leverage over France, French politics, and French foreign policy. Let us hope they are up to that. The simple point is that being influenced by someone in your home country, even if it sounds more appealing rhetorically, is not always better than being pushed around by foreigners. Sometimes the foreigners are less oppressive and intrusive, if only because they care less about you.

Worth a ponder.  I am hearing good things about the new Mistral model, so these questions may become relevant sooner than I had thought when writing this.

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

This result does not surprise me at all.  Here is part of the abstract:

Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ. These findings highlight the need for independent, real-world evaluation of AI tools before they enter clinical settings.

From Krithik Viswanath, et.al.  As a side note, this (and the more general version of the point) is one big reason why some fairly large number of Emergent Ventures proposals are rejected rather quickly.

Sometimes it is hard to solve for the equilibrium

Probably you all know about this:

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance.

According to not yet confirmed but likely true reports, it was shown that model could be jailbroken.  The released Mythos already restricted bio and “AI improvement” queries, rather strictly in fact, so now we are back to the model not being available.

Here are a few of the constraints on the U.S. government, not the only ones I might add:

1. It needs for the main companies to stay in business.  On top of that, it wants their IPOs to go reasonably well.  And it is now much harder for the top companies to recruit foreigners, which is a significant share of their highest quality workforce (Demis, Ilya, Andrej for a start).  It is also much harder for the main companies to drum up foreign business in a credible and sustainble manner.

1b. How are American multinationals operating abroad supposed to use top systems, moving forward?

2. It wants to use model access as a tool of both hard and soft power, so model access has to be possible at some level.  But it is very hard to control what foreign agents will do with their partial model access, when they get it in the ffuture.

3. The U.S. needs to stay ahead of China in the AI race.

4. The U.S. needs to issue restrictions that are actually enforceable, and “U.S. citizens only” does not fit that bill.  Furthermore (markets in everything!) it is easy enough to hire a traitorous American to access tools of wrongdoing, or for matter it is not difficult to fake citizenship in various ways.

5. USG cannot nationalize these companies and then proceed to run them effectively.

6. Chinese and other open source models do in fact improve at some reasonable pace, even if they are right now considerably behind the best proprietary models.

Is the most likely scenario that the government hardens some of its own systems and takes some further precautions, and then allows Mythos to be rereleased?  Perhaps with some additional safeguards?

Is there such a thing as a model that cannot be jailbroken at all?  I doubt that.

So basically we will be replaying this scenario periodically over time, but with each time the companies and also the government in a weaker and more precarious position.

I am willing to reject the philosophy of “safetyism” and bite various associated bullets.  As it stands, these actions will not succeed in making us safer, including for the reasons mentioned above.  Our regulatory institutions, attitudes, and approaches simply are not well suited to an era of radical innovation.

In any case these events do not surprise me (they do surprise me in their immediate suddenness however), as this kind of approach is what governments have been about for a long time now, USG included or perhaps USG especially.

Rising in status: Leopold, Aesop, and also Mistral.  AI nationalism.  Proponents of slow take-off as the likely scenario.  Reticent, quiet CEOs.  As for China, will they rush into this opportunity, or are they at least as scared as we are?

How did Stanislaw Lem imagine advanced computer intelligence?

…GOLEM’s behavior is unpredictable.  Sometimes it converses courteously with people, whereas on other occasions any attempt at contact misfires.  GOLEM sometimes cracks jokes, too, though its sense of humor is fundamentally different from man’s.  Much depends on its interlocutors.  In exceptional casese GOLEM will show a certain interest in people who are talented in a particular way; it is intrigued, so to speak, not by mathematical aptitude — not even the greatest — but rather by interdisciplinary forms of talent; on several occasions it has predicted with uncanny accuracy achievements by young, as yet unknown, scientists in a field which it has it self indicated.  (After a brief exchange it informed T. Vroedel, age twenty-two and then only a doctoral candidate, “You will become a computer,” which was supposed to mean, more o less, “You will become somebody.”)

That is from Lem’s Imaginary Magnitude, an extraordinary book in parts, most of all see his Golem IV section on how n AGI (our term, not his) is likely to behave.

Again, the research paper format will be dying out

From Xudong Han:

‘Recently, I came across a paper co-authored by 37 authors from Stanford, CMU, Michigan, and elsewhere: *The Last Human-Written Paper*.

The core argument is pretty brutal: the paper format we’ve been using for centuries might already be obsolete in the AI era.

The authors point out two “invisible taxes” that we’ve long overlooked:

One is the narrative tax. To tell a compelling story, we delete failed experiments, dead ends, and overturned hypotheses. What AI reads is a “walkthrough guide” to beating the game, but it misses the truly valuable “pitfall logs.”

The other is the engineering tax. The implementation details in papers are usually enough to convince reviewers, but not enough for an Agent to directly reproduce. Many key tricks are still buried in the authors’ heads, code comments, and Slack threads.

So the authors propose ARA, transforming papers directly into “research packages” that Agents can read and execute: not just telling you the conclusions, but packaging in how they were reached, how the code runs, where the evidence chain is, and which paths led nowhere.

I think the most intriguing part of this paper is that it’s not discussing how AI can help humans write papers—it’s asking:

When AI also becomes a reader and executor of papers, should papers still look like they do today?

In the future, the core of research output might no longer be “how much it resembles a paper,” but whether it can be understood, reproduced, traced, and iteratively extended by AI.

Humans have been writing papers for centuries—next, we might start writing research packages for Agents to execute.

Here is my earlier post on whether the research paper will die out.  By the way, as a side point has anyone mentioned that, due to writing detection abilities of AI models, anonymous referee reports are now a thing of the past?

A simple reason for skepticism about the iPhones/fertility link

Here is the background to the debate.  Here is more from Noah.  Here is a thread from researcher Caitlin Myers.  And here is some basic information:

In 2008, 1.9% is the share of the mobile-subscribing population with an iPhone wireless subscription.  As a percent of all adults that is 1.6%.

In 2009, it is 4.3%.  3.6% of all adults.

In 2010, 6.8%.  5.5% of all adults.

Plus conception to birth takes nine months (give or take!), noting that actual family planning may make this lag far longer.  In 2008 fertility rates already were falling pretty sharply.  The whole “maybe the iPhone messes up your dating processes” factor also requires some time to operate, especially since iPhones as a network of many many users, and whatever negative effects on socializing you think that might have, was still to lie in the future.  And what you could access on the iPhone then was far more limited than today.

So when the authors talk about diffusion explaining 33–52% of the decline in the general fertility rate among American women 15–44, I still do not get how that is supposed to operate.

The explanations I am hearing seem to be parasitic on world intuitions from 2026, not the time period under consideration.

What do the AIs think of us?

Asked to answer as a typical human, every cutting-edge model rated us markedly more neurotic, less open, less agreeable and less conscientious than they rated themselves. The gap on Neuroticism alone is 1.69 points on a 5-point scale.

Here is more material of interest.  And this:

Across 31 models from those seven labs they answer the personality tests in unison: high openness, low Dark Triad, Universalism on top, Power dead last in every single model.

Are the AIs conscious?

That is the topic of my latest Free Press column.  I will spare you the discussion of the AIs, but here is what I have to say about the humans:

I am here to tell you that there is no ghost in the machine. But perhaps more importantly, there is barely a “ghost” in your own human machine. “Are people conscious?” is a better and more scientifically plausible question than whether AIs are conscious.

If there is one near-universal tendency of humans, it is to attribute intent where none is present. Prehistoric humans anthropomorphized nature and attributed natural events to good and bad deities. These kinds of beliefs persist today, not only in the folk religions of the world, but in human obsessions with fortune tellers, tarot cards, and the supernatural…

If there is one systematic flaw that humans have, it is an excessive willingness to ascribe conscious intent and to anthropomorphize purely natural and material entities. It seems we are strongly disposed toward this bias.

Yet few of us are willing to examine what is perhaps the biggest and most significant way we make this mistake. When it comes to understanding ourselves, so many of us assume that “we are in charge.” We identify our phenomenological stream of consciousness with our actual selves, and treat that consciousness stream as the true decision maker.

The reality is that you—whatever we take that concept to mean—make most or maybe all of your actual decisions in parts of your brain that precede what you take to be the conscious choice. Among experts in neuroscience, this is not a controversial proposition. As brain surgeon Theodore Schwartz explained to me: “I do not think we have free will in the way that most people do. I think that our brains make decisions for us. We carry out those behaviors, and then we write a story that makes it into a logical timeline that makes us feel as if we were the ones, that there was a self that made that decision, whereas, in fact, that self didn’t really exist.”

…Sometimes I like to say that “I am only conscious at the margin.” Tongue in cheek, I will suggest that I am only conscious enough to avoid the self-contradiction of asserting that I am not conscious at all. I feel I am honest enough to just not be very impressed by my own flow of conscious awareness or its ability to perform complex calculations. Still, I recognize that it is all I have got, so I need to treasure it, however paltry it may be.

And by the way I do not think the AIs are conscious, no more than I believe in the Thunder God of Thor.

Stanislaw Lem foresaw drones

This was published in English (and Polish) in 1986 under the title One Human Minute:

So it was not humanoid automata that former the new armies but synthetic insects (synsects) — ceramic microcrustacea, titanium annelids, and flying pseudo-hymenoptera with nerve centers made of arsenic compounds and with stingers of heavy, fissionable elements…The flying synsect combined plane, pilot, and missile in one miniature whole.  but the operating unit was the microarmy, which possessed superior combat effectiveness only as a whole (just as a colony of bees was an independent, surviving unit while a single bee was nothing).

…The nonliving, synthetic “locust” was incomparably more lethal, since it was made that way by its designers.  It possessed a preprogrammed autonomy, so that communication with a command center was unnecessary.

…the microarmy was one giant flowing or flying aggregate of self-assembling elements.  It started out dispersed, approaching its objective from many different directions, as strategy or tactics demanded, in order to concentrate into a preprogrammed whole on the battlefield.  For this fighting material did not leave the factory in final shape, read for use, like tanks or guns loaded on a railroad flatcar; the mechanisms were microproductive blocks designed to fuse together into a war machine at the designated place.  For this reason, such armies were called “self-bonding.”

…Amid a swarm of self-guided, programmed microarms, a man in uniform was as helpless as a Roman legionary with sword and shield against a hail of bullets.  In the face of special types of biotropic microarms capable of destroying everything that lived, human beings had no choice but to abandone the battlefield, for they would be killed in seconds…

A microarmy could easily penetrate all systems of defense and go deep into enemy territory.  It had no more trouble accomplishing this than did rain or snow.  Meanwhile, high-powered nuclear weapons were proving more and more useless on the battlefield.

Lem is always worth reading.