Category: Web/Tech

Sometimes it is hard to solve for the equilibrium

Probably you all know about this:

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance.

According to not yet confirmed but likely true reports, it was shown that model could be jailbroken.  The released Mythos already restricted bio and “AI improvement” queries, rather strictly in fact, so now we are back to the model not being available.

Here are a few of the constraints on the U.S. government, not the only ones I might add:

1. It needs for the main companies to stay in business.  On top of that, it wants their IPOs to go reasonably well.  And it is now much harder for the top companies to recruit foreigners, which is a significant share of their highest quality workforce (Demis, Ilya, Andrej for a start).  It is also much harder for the main companies to drum up foreign business in a credible and sustainble manner.

1b. How are American multinationals operating abroad supposed to use top systems, moving forward?

2. It wants to use model access as a tool of both hard and soft power, so model access has to be possible at some level.  But it is very hard to control what foreign agents will do with their partial model access, when they get it in the ffuture.

3. The U.S. needs to stay ahead of China in the AI race.

4. The U.S. needs to issue restrictions that are actually enforceable, and “U.S. citizens only” does not fit that bill.  Furthermore (markets in everything!) it is easy enough to hire a traitorous American to access tools of wrongdoing, or for matter it is not difficult to fake citizenship in various ways.

5. USG cannot nationalize these companies and then proceed to run them effectively.

6. Chinese and other open source models do in fact improve at some reasonable pace, even if they are right now considerably behind the best proprietary models.

Is the most likely scenario that the government hardens some of its own systems and takes some further precautions, and then allows Mythos to be rereleased?  Perhaps with some additional safeguards?

Is there such a thing as a model that cannot be jailbroken at all?  I doubt that.

So basically we will be replaying this scenario periodically over time, but with each time the companies and also the government in a weaker and more precarious position.

I am willing to reject the philosophy of “safetyism” and bite various associated bullets.  As it stands, these actions will not succeed in making us safer, including for the reasons mentioned above.  Our regulatory institutions, attitudes, and approaches simply are not well suited to an era of radical innovation.

In any case these events do not surprise me (they do surprise me in their immediate suddenness however), as this kind of approach is what governments have been about for a long time now, USG included or perhaps USG especially.

Rising in status: Leopold, Aesop, and also Mistral.  AI nationalism.  Proponents of slow take-off as the likely scenario.  Reticent, quiet CEOs.  As for China, will they rush into this opportunity, or are they at least as scared as we are?

How did Stanislaw Lem imagine advanced computer intelligence?

…GOLEM’s behavior is unpredictable.  Sometimes it converses courteously with people, whereas on other occasions any attempt at contact misfires.  GOLEM sometimes cracks jokes, too, though its sense of humor is fundamentally different from man’s.  Much depends on its interlocutors.  In exceptional casese GOLEM will show a certain interest in people who are talented in a particular way; it is intrigued, so to speak, not by mathematical aptitude — not even the greatest — but rather by interdisciplinary forms of talent; on several occasions it has predicted with uncanny accuracy achievements by young, as yet unknown, scientists in a field which it has it self indicated.  (After a brief exchange it informed T. Vroedel, age twenty-two and then only a doctoral candidate, “You will become a computer,” which was supposed to mean, more o less, “You will become somebody.”)

That is from Lem’s Imaginary Magnitude, an extraordinary book in parts, most of all see his Golem IV section on how n AGI (our term, not his) is likely to behave.

Again, the research paper format will be dying out

From Xudong Han:

‘Recently, I came across a paper co-authored by 37 authors from Stanford, CMU, Michigan, and elsewhere: *The Last Human-Written Paper*.

The core argument is pretty brutal: the paper format we’ve been using for centuries might already be obsolete in the AI era.

The authors point out two “invisible taxes” that we’ve long overlooked:

One is the narrative tax. To tell a compelling story, we delete failed experiments, dead ends, and overturned hypotheses. What AI reads is a “walkthrough guide” to beating the game, but it misses the truly valuable “pitfall logs.”

The other is the engineering tax. The implementation details in papers are usually enough to convince reviewers, but not enough for an Agent to directly reproduce. Many key tricks are still buried in the authors’ heads, code comments, and Slack threads.

So the authors propose ARA, transforming papers directly into “research packages” that Agents can read and execute: not just telling you the conclusions, but packaging in how they were reached, how the code runs, where the evidence chain is, and which paths led nowhere.

I think the most intriguing part of this paper is that it’s not discussing how AI can help humans write papers—it’s asking:

When AI also becomes a reader and executor of papers, should papers still look like they do today?

In the future, the core of research output might no longer be “how much it resembles a paper,” but whether it can be understood, reproduced, traced, and iteratively extended by AI.

Humans have been writing papers for centuries—next, we might start writing research packages for Agents to execute.

Here is my earlier post on whether the research paper will die out.  By the way, as a side point has anyone mentioned that, due to writing detection abilities of AI models, anonymous referee reports are now a thing of the past?

A simple reason for skepticism about the iPhones/fertility link

Here is the background to the debate.  Here is more from Noah.  Here is a thread from researcher Caitlin Myers.  And here is some basic information:

In 2008, 1.9% is the share of the mobile-subscribing population with an iPhone wireless subscription.  As a percent of all adults that is 1.6%.

In 2009, it is 4.3%.  3.6% of all adults.

In 2010, 6.8%.  5.5% of all adults.

Plus conception to birth takes nine months (give or take!), noting that actual family planning may make this lag far longer.  In 2008 fertility rates already were falling pretty sharply.  The whole “maybe the iPhone messes up your dating processes” factor also requires some time to operate, especially since iPhones as a network of many many users, and whatever negative effects on socializing you think that might have, was still to lie in the future.  And what you could access on the iPhone then was far more limited than today.

So when the authors talk about diffusion explaining 33–52% of the decline in the general fertility rate among American women 15–44, I still do not get how that is supposed to operate.

The explanations I am hearing seem to be parasitic on world intuitions from 2026, not the time period under consideration.

What do the AIs think of us?

Asked to answer as a typical human, every cutting-edge model rated us markedly more neurotic, less open, less agreeable and less conscientious than they rated themselves. The gap on Neuroticism alone is 1.69 points on a 5-point scale.

Here is more material of interest.  And this:

Across 31 models from those seven labs they answer the personality tests in unison: high openness, low Dark Triad, Universalism on top, Power dead last in every single model.

Are the AIs conscious?

That is the topic of my latest Free Press column.  I will spare you the discussion of the AIs, but here is what I have to say about the humans:

I am here to tell you that there is no ghost in the machine. But perhaps more importantly, there is barely a “ghost” in your own human machine. “Are people conscious?” is a better and more scientifically plausible question than whether AIs are conscious.

If there is one near-universal tendency of humans, it is to attribute intent where none is present. Prehistoric humans anthropomorphized nature and attributed natural events to good and bad deities. These kinds of beliefs persist today, not only in the folk religions of the world, but in human obsessions with fortune tellers, tarot cards, and the supernatural…

If there is one systematic flaw that humans have, it is an excessive willingness to ascribe conscious intent and to anthropomorphize purely natural and material entities. It seems we are strongly disposed toward this bias.

Yet few of us are willing to examine what is perhaps the biggest and most significant way we make this mistake. When it comes to understanding ourselves, so many of us assume that “we are in charge.” We identify our phenomenological stream of consciousness with our actual selves, and treat that consciousness stream as the true decision maker.

The reality is that you—whatever we take that concept to mean—make most or maybe all of your actual decisions in parts of your brain that precede what you take to be the conscious choice. Among experts in neuroscience, this is not a controversial proposition. As brain surgeon Theodore Schwartz explained to me: “I do not think we have free will in the way that most people do. I think that our brains make decisions for us. We carry out those behaviors, and then we write a story that makes it into a logical timeline that makes us feel as if we were the ones, that there was a self that made that decision, whereas, in fact, that self didn’t really exist.”

…Sometimes I like to say that “I am only conscious at the margin.” Tongue in cheek, I will suggest that I am only conscious enough to avoid the self-contradiction of asserting that I am not conscious at all. I feel I am honest enough to just not be very impressed by my own flow of conscious awareness or its ability to perform complex calculations. Still, I recognize that it is all I have got, so I need to treasure it, however paltry it may be.

And by the way I do not think the AIs are conscious, no more than I believe in the Thunder God of Thor.

Stanislaw Lem foresaw drones

This was published in English (and Polish) in 1986 under the title One Human Minute:

So it was not humanoid automata that former the new armies but synthetic insects (synsects) — ceramic microcrustacea, titanium annelids, and flying pseudo-hymenoptera with nerve centers made of arsenic compounds and with stingers of heavy, fissionable elements…The flying synsect combined plane, pilot, and missile in one miniature whole.  but the operating unit was the microarmy, which possessed superior combat effectiveness only as a whole (just as a colony of bees was an independent, surviving unit while a single bee was nothing).

…The nonliving, synthetic “locust” was incomparably more lethal, since it was made that way by its designers.  It possessed a preprogrammed autonomy, so that communication with a command center was unnecessary.

…the microarmy was one giant flowing or flying aggregate of self-assembling elements.  It started out dispersed, approaching its objective from many different directions, as strategy or tactics demanded, in order to concentrate into a preprogrammed whole on the battlefield.  For this fighting material did not leave the factory in final shape, read for use, like tanks or guns loaded on a railroad flatcar; the mechanisms were microproductive blocks designed to fuse together into a war machine at the designated place.  For this reason, such armies were called “self-bonding.”

…Amid a swarm of self-guided, programmed microarms, a man in uniform was as helpless as a Roman legionary with sword and shield against a hail of bullets.  In the face of special types of biotropic microarms capable of destroying everything that lived, human beings had no choice but to abandone the battlefield, for they would be killed in seconds…

A microarmy could easily penetrate all systems of defense and go deep into enemy territory.  It had no more trouble accomplishing this than did rain or snow.  Meanwhile, high-powered nuclear weapons were proving more and more useless on the battlefield.

Lem is always worth reading.

How well does current AI find errors in economics papers?

Can artificial intelligence (AI) refute economic theory? I document experiments in which I asked several AI models (Gemini, Refine, Claude, and ChatGPT) to check the correctness of four published papers in economic theory, each containing an error that I helped identify or correct. ChatGPT Pro performed best, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. I argue that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.

That is from a new piece by Alexis Akira Toda.

New paper on the iPhone and fertility

The U.S. general fertility rate has fallen by 22% since 2007, a sustained decline not readily explained by economic conditions, contraceptive use, housing or childcare costs, or other commonly cited factors. We assess the potential role of a different shock: the diffusion of the smartphone. The U.S. rollout of the iPhone, the first modern smartphone, provides a natural experiment: from June 2007 through February 2011, the device was sold only on AT&T, allowing us to identify its effect from variation in AT&T’s mobile broadband coverage. Entropy-balanced Poisson and synthetic difference-in-differences event studies imply that access to the iPhone reduced births by 4.5–8.0% at ages 15–19 and 3.2–6.6% at ages 20–24, with statistically significant but smaller declines among older cohorts. Placebo analyses applied to Verizon and Sprint’s pre-2011 coverage footprint are null. Taken together, these cohort effects imply that the diffusion of the iPhone deepened the decline in births among women under 30 while suppressing the rise in births among older women. Overall, the diffusion of the iPhone explains 33–52% of the decline in the general fertility rate among women aged 15–44. National-survey evidence on time use and sexual behavior is consistent with the iPhone reducing in-person interactions, increasing pornography use, and reducing sexual frequency.

That is from Caitlin K. Myers Ezekiel Hooper.  An interesting and difficult to discuss question is how much we actually want teen fertility rates to decline, and to what extent we should consider such declines a good thing.

Note also that as this study is set up it does not discriminate against the ” the iPhone effect on fertility is mainly a thing of timing” hypothesis.  And a Paul Novosad comment.

Might AI hurt corporate profits? (from my email)

From Clifford Sosin:

I loved your talk about AI and wanted to bounce an idea off you.

I think AI may be bad for corporate profit margins.

A lot of companies make money because their customers can’t be bothered to monitor them more closely, or to insource something. Customers let the company make some money in exchange for doing a decent-enough job and making the problem go away.

Bank of America has $2 trillion of deposits, not a penny of which is optimized. Most enterprise software vendors could be switched out far more often, or displaced by home-built software, but it’s too much of a pain. I could run a 12-party RFP for an Uber ride or a pair of socks, but I don’t.

In a sense, many professionals are an extension of the same idea. I could research my own real estate law, or my own insurance, whether business or personal, but I don’t because it would be too hard.

Google Search might be the biggest example. It makes money because advertisers know they need to be at the top of the results to be found. But my agent will happily search all the results across multiple search engines.

AI agents should change all this. By acting as incredibly rational and vigilant sourcing agents, CFOs, and experts for their users, they will take rents previously collected by these toll-takers and redistribute them to consumers.

And I don’t think the AI stack itself necessarily makes much profit. Commodity and open-weight models are hot on the heels of the major model companies, and competition in GPUs should intensify. Indeed, making a GPU is in some ways similar to making software, so perhaps it can commoditize substantially. Chip manufacturing may remain high-margin, but there are now plenty of entrants drawn in by the shortage who could make TSMC’s market more competitive over time.

Some companies will win. Low-cost providers may gain share as customers switch more often. Richer consumers may consume more high-end goods. Companies with genuinely advantaged business models and limited competition will be able to become more efficient. But my overriding sense is that the equilibrium outcome is lower margins for companies.

Of course, people will build new businesses, and maybe they will use AI to generate very high margins in ways I haven’t considered. That would prove me wrong.

But if this lower-margin hypothesis is true, the knock-on effects are probably positive for AI adoption, since it will make the models more popular with consumers.

And if your view is that AI drives GDP growth to be only 5–10% higher over the next decade, it’s possible that a 100–200 bp decline in corporate margins from roughly 12% would mean companies in aggregate don’t see much benefit — or in fact lose — even as consumers are better off.

Is work from home bad for your mental health?

From the “Results” section:

Relative to those in nonremotable jobs, workers in remotable jobs spent approximately one additional hour alone per workday after the pandemic. Those in remotable jobs also differentially increased days spent entirely alone and decreased after-work socializing. The rise in isolation was sharpest for those living alone, whose likelihood of spending the whole day without social contact rose by 7 percentage points (83%).

Mental distress simultaneously increased: Scores on the Kessler (K-6) measure of generalized psychological distress rose by 0.1 standard deviations for those in remotable jobs relative to those in nonremotable jobs. The increase in distress was roughly twice as large for those living alone compared with those living with family. Alternative measures of mental distress—such as the frequency of depression, mental health care utilization, and antidepressant prescriptions—show similar trends. In contrast, workers in remotable jobs did not differentially increase visits to non–mental health care providers or non–mental health prescriptions (statins, for example), suggesting that the change was not merely driven by increased flexibility for doctor visits.

That is from a recently published paper by Natalia Emanuel, Emma Harrington, and Amanda Pallais.

Law professors prefer AI over peer answers

Large language models (LLMs) are increasingly promoted as educational tutors, yet most evaluations focus on domains with a single ground truth. Many disciplines, however, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Law provides a sharp test. We conducted a blinded evaluation of short-answer tutoring in contracts courses with sixteen U.S. law professors. Participants created 40 representative questions, wrote answers, and judged 2,918 anonymized comparisons between human and LLM responses. Professors rated LLMs far higher than their peers (average win rate = 75.33%), with models performing similarly to the best instructor. LLM responses were also rarely flagged as harmful (3.53%, vs 12.06% for professors). Preferences for LLM answers were consistent across evaluators and reflected shared professional standards. Our evaluation can be reliably extended to additional models by employing a separate LLM as a judge, rendering expert agreements an effective, scalable method to evaluate AI tutors in judgment-rich domains.

“far”.  That is from a new paper by Alejandro Salinas, et.al.  Via Andrew Curran.  And via John Chamberlain:

Artificial intelligence (AI) and large language models (LLMs) tools are capable of mass-producing academic finance papers that are nearly indistinguishable from human-authored research, according to a new study published in the Journal of Economic Literature.

C’mon people, get ready.  I know it is difficult to admit when your human capital has been devalued, but that time is upon us.  In particular, being prolific is no longer such a comparative advantage in academia.  You might run to the “but I know what questions to ask” cope, but I implore you to solve for the equilibrium.  What is the equilibrium wage for merely asking questions?

Of course academic life and projects will continue, but the real rewards will go to people doing new, innovative, and hitherto impossible projects with AI.