Ilya’s talk
Twenty-four minutes, sixteen minutes for the core talk, self-recommending.
Jennifer Pahlka on DOGE
An excellent piece, one of the best I have read all year. Here is the concluding paragraph:
We can wish that the government efficiency agenda were in the hands of someone else, but let’s not pretend that change was going to come from Democrats if they’d only had another term, and let’s not delude ourselves that change was ever going to happen politely, neatly, carefully. However we got here, we may now be in a Godzilla vs Kong world. Perhaps we’re about to get a natural experiment in which Elonzilla faces off with Larry ElliKong. One of the things we need to be ready to learn is that Elonzilla could lose. Or worse, since Elon and Larry are friends, the expected disruptive could get co-opted. And what would that say about the problem? Conjuring Elon is not bringing a gun to a knife fight. It was never a knife fight.
Recommended.
The Effects of Gender Integration on Men
Evidence from the U.S. military:
Do men negatively respond when women first enter an occupation? We answer this question by studying the end of one of the final explicit occupational barriers to women in the U.S.: in 2016, the U.S. military opened all positions to women, including historically male-only combat occupations. We exploit the staggered integration of women into combat units to estimate the causal effects of the introduction of female colleagues on men’s job performance, behavior, and perceptions of workplace quality, using monthly administrative personnel records and rich survey responses. We find that integrating women into previously all-male units does not negatively affect men’s performance or behavioral outcomes, including retention, promotions, demotions, separations for misconduct, criminal charges, and medical conditions. Most of our results are precise enough to rule out small, detrimental effects. However, there is a wedge between men’s perceptions and performance. The integration of women causes a negative shift in male soldiers’ perceptions of workplace quality, with the effects driven by units integrated with a woman in a position of authority. We discuss how these findings shed light on the roots of occupational segregation by gender.
That is all from
A new paper on the economics of AI alignment
A principal wants to deploy an artificial intelligence (AI) system to perform some task. But the AI may be misaligned and pursue a conflicting objective. The principal cannot restrict its options or deliver punishments. Instead, the principal can (i) simulate the task in a testing environment and (ii) impose imperfect recall on the AI, obscuring whether the task being performed is real or part of a test. By committing to a testing mechanism, the principal can screen the misaligned AI during testing and discipline its behaviour in deployment. Increasing the number of tests allows the principal to screen or discipline arbitrarily well. The screening effect is preserved even if the principal cannot commit or if the agent observes information partially revealing the nature of the task. Without commitment, imperfect recall is necessary for testing to be helpful.
That is by Eric Olav Chen, Alexis Ghersengorin, and Sami Petersen. And here is a tweet storm on the paper. I am very glad to see the idea of an optimal principal-agent contract brought more closely into AI alignment discussions. As you can see, it tends to make successful alignment more likely.
Friday assorted links
1. Space flight during the 21st century.
2. Cato-suggested DOGE reforms.
3. AI progress is massive but increasingly non-legible.
4. One of the biggest benefits of travel: “IMO one of the biggest benefits of travel is just acquiring a scaffold to hang future knowledge on. Places that had similar embeddings in my mind before I saw them (Chongqing vs Chengdu, Abu Dhabi vs Dubai, Wroclaw vs Warsaw, etc.) become extremely distinct, and future facts become much stickier.”
5. Amazing Kreskin, RIP. Though I feel this NYT obit failed to properly appreciate him?
6. Henry Oliver defends literary criticism.
7. Gemini 2 the streaming API.
8. An innovation agenda for addiction.
9. o1 pro on how USG might be related to the NJ drones. And Sam Hammond offers a hypothesis. Here is what a NJ Senator is reporting. Domestic testing has to be the number one hypothesis at this point.
Zakaria on Rent Seeking
Fareed Zakaria on Freakonomics Radio:
ZAKARIA: You can see it in what happened a day after the election results became clear. You got a flurry of tweets from every major C.E.O. in America — every major tech C.E.O., every bank C.E.O. — fawning over Trump, congratulating him and telling him how much they wanted to work well with him. I think that this is a very sad development that’s happened. It’s not entirely because of Trump. But we have politicized the economy in America. All this industrial policy, these tariffs, these bans. What that does is it suddenly makes Washington a very crucial arbiter to the success of business. You add to it Trump, who personally loves the idea of fining Caterpillar for doing this and Harley Davidson for doing that and Chase for doing — he views it as his job as president to literally dole out rewards and punishments to companies, depending on whether they do what he regards as the right thing or the wrong thing. It’s deeply saddening to me as somebody who grew up in India, where this is business as usual. Every business had to slavishly pander to whoever the prime minister at the time was. And you see it in Musk. Tesla stock, in the two days after Trump won, was up 20 percent or something like that, adding tens of billions of dollars to Elon Musk’s net worth. Nothing fundamental in the economics had changed for Tesla. There was just an expectation, now that he was a friend of Trump’s, that he was going to somehow be showered with federal largesse. You know, there’s a guy in India called Adani who’s Modi’s best friend, and his stocks trade at multiples 10 times that of every other Indian company. Because everyone assumes that at the end of the day, being Modi’s best friend is worth $100 billion or something like that.
DUBNER: That’s probably a pretty safe assumption.
ZAKARIA: It’s a safe assumption in India. What’s tragic is it might even be a safe assumption in America. But it’s not what the American economy was supposed to be about. And I think it’s a very sad trend.
Hat tip: Larry White.
On the Gukesh match (from my email)
- Unusual, relative to other competitions, that the (undisputed?) top 3 players weren’t competing for the title. The quality of play seemed correspondingly lower than other big tournaments.
- Ding really is quite an anomaly. Some massive holds despite admitting in interviews to not seeing some pretty straightforward lines. Huge props for him to reach game 14 the way he did with a win in game 12.
- Very very sad way (for both he and gukesh) to end the match.
- Guskesh will be an excellent champion and ambassador of the game.
- The format looks massively stressful, and I get why Magnus doesn’t play it anymore, but I think the format in a way is actually perfect. It rewards massive amounts of prep with an engine, plus provides a serious psychological battle, which all together actually seems to be a pretty good examination of what chess is in a world where humans are inferior to computers. Rapid is more fun to play, but classical really tests all your mental faculties.
- Next world championship could be a great one. Maybe a last chance for Fabi or Hikaru to win. Also very likely we see an all Indian match.
That is from S., all astute observations.
My Conversation with the excellent Paula Byrne
Here is the audio, video, and transcript. Here is part of the episode summary:
Tyler and Paula discuss Virginia Woolf’s surprising impressions of Hardy, why Wessex has lost a sense of its past, what Jude the Obscure reveals about Hardy’s ideas about marriage, why so many Hardy tragedies come in doubles, the best least-read Hardy novels, why Mary Robinson was the most interesting woman of her day, how Georgian theater shaped Jane Austen’s writing, British fastidiousness, Evelyn Waugh’s hidden warmth, Paula’s strange experience with poison pen letters, how American and British couples are different, the mental health crisis among teenagers, the most underrated Beatles songs, the weirdest thing about living in Arizona, and more.
This was one of the most fun — and funny — CWTs of all time. But those parts are best experienced in context, so I’ll give you an excerpt of something else:
COWEN: Your book on Evelyn Waugh, the phrase pops up, and I quote, “naturally fastidious.” Why can it be said that so many British people are naturally fastidious?
BYRNE: Your questions are so crazy. I love it. Did I say that? [laughs]
COWEN: I think Evelyn Waugh said it, not you. It’s in the book.
BYRNE: Give me the context of that.
COWEN: Oh, I’d have to go back and look. It’s just in my memory.
BYRNE: That’s really funny. It’s a great phrase.
COWEN: We can evaluate the claim on its own terms, right?
BYRNE: Yes, we can.
COWEN: I’m not sure they are anymore. It seems maybe they once were, but the stiff-upper-lip tradition seems weaker with time.
BYRNE: The stiff upper lip. Yes, I think Evelyn Waugh would be appalled with the way England has gone. Naturally fastidious, yes, it’s different to reticent, isn’t it? Fastidious — hard to please, it means, doesn’t it? Naturally hard to please. I think that’s quite true, certainly of Evelyn Waugh because he was naturally fastidious. That literally sums him up in a phrase.
COWEN: If I go to Britain as an American, I very much have the feeling that people derive status from having negative opinions more than positive. That’s quite different from this country. Would you agree with that?
Definitely recommended, one of my favorite episodes in some while. And of course we got around to discussing Paul McCartney and Liverpool…
Another look at the New Jersey drone evidence
Here is the chain of examples, one can assume not all are represented properly, nonetheless I remain stumped.
Thursday assorted links and non-links
1. Slightly salacious betting markets in everything.
3. From my email: “I’m curious, if you have the time to reply, about your personal Straussianism and if you believe scarce context is what makes the world livable, or if its actually a tragedy.”
4. The music of Sid Meier’s Civilization.
5. A partial guide to the New Right.
6. Zvi on o1.
Tetlock on Testing Grand Theories with AI
Testing grand theories of politics (or economics) is difficult because such theories are always contingent on ceteris paribus assumptions but outside of a lab, all else is rarely the same. The great Philip Tetlock has run multi-decade forecasting experiments but these are time and resource consuming. Tetlock, however, now suggests that LLMs could speed the process of testing grand theories like Mearsheimer’s neo-realism theory of politics:
With current or soon to be available technology, we can instruct large language models (LLMs) to reconstruct the perspectives of each school of thought, circa 1990,and then attempt to mimic the conditional forecasts that flow most naturally from each intellectual school. This too would be a multi-step process:
1. Ensuring the LLMs can pass ideological Turing tests and reproduce the assumptions, hypotheses and forecasts linked to each school of thought. For instance, does Mearsheimer see the proposed AI model of his position to be a reasonable approximation? Can it not only reproduce arguments that Mearsheimer explicitly endorsed from 1990-2024 but also reproduce claims that Mearsheimer never made but are in the spirit of his version of neorealism. Exploring views on historical counterfactual claims would be a great place to start because the what-ifs let us tease out the auxiliary assumptions that neo-realists must make to link their assumptions to real-world forecasts. For instance, can the LLMs predict how much neorealists would change their views on the inevitability of Russian expansionism if someone less ruthless than Putin had succeeded Yeltsin? Or if NATO had halted its expansion at the Polish border and invited Russia to become a candidate member of both NATO and the European Union?
2. Once each school of thought is satisfied that the LLMs are fairly characterizing, not caricaturing, their views on recent history(the 1990-2024) period, we can challenge the LLMs to engage in forward-in-time reasoning. Can they reproduce the forecasts for 2025-2050 that each school of thought is generating now? Can they reproduce the rationales, the complex conditional propositions, underlying the forecasts—and do so to the satisfaction of the humans whose viewpoints are being mimicked?
3. The final phase would test whether the LLMs are approaching superhuman intelligence. We can ask the LLMs to synthesize the best forecasts and rationales from the human schools of thought in the 1990-2024 period, and create a coherent ideal-observer framework that fits the facts of the recent past better than any single human school of thought can do but that also simultaneously recognizes the danger of over-fitting the facts (hindsight bias). We can also then challenge these hypothesized-to-be-ideal-observer LLM s to make more accurate forecasts on out-of-sample questions, and craft better rationales, than any human school of thought.
Gender Composition and Group Behavior
Evidence from city councils:
How does gender composition influence individual and group behavior? To study this question empirically, we assembled a new, national sample of United States city council elections and digitized information from the minutes of over 40,000 city-council meetings. We find that replacing a male councilor with a female councilor results in a 25p.p. increase in the share of motions proposed by women. This is despite causing only a 20p.p. increase in the council female share. The discrepancy is driven, in part, by behavioral changes similar to those documented in laboratory-based studies of gender composition. When a lone woman is joined by a female colleague, she participates more actively by proposing more motions. The apparent changes in behavior do not translate into clear differences in spending. The null finding on spending is not driven by strategic voting; however, preference alignment on local policy issues between men and women appears to play an important role. Taken together, our results both highlight the importance of nominal representation for cultivating substantive participation by women in high-stakes decision making bodies; and also provide evidence in support of the external validity of a large body of laboratory-based work on the consequences of group gender composition.
That is from a new NBER working paper by
Health insurance companies are not the main villain
First of all, insurance companies just don’t make that much profit. UnitedHealth Group, the company of which Brian Thompson’s UnitedHealthcare is a subsidiary, is the most valuable private health insurer in the country in terms of market capitalization, and the one with the largest market share. Its net profit margin is just 6.11%…
That’s only about half of the average profit margin of companies in the S&P 500. And other big insurers are even less profitable. Elevance Health, the second-biggest, has a margin of between 2% and 4%. Centene’s margin is usually around 1% to 2%. Cigna Group’s margin is usually around 2% to 3%. And so on. These companies are just making very little profit at all.
And:
In other words, Americans’ much-hated private health insurers are paying a higher percent of the cost of Americans’ health care than the government insurance systems of Sweden and Denmark and the UK are paying. The only reason Americans’ bills are higher is that U.S. health care provision costs so much more in the first place.
And:
In fact, the Kaiser Family Foundation does detailed comparisons between U.S. health care spending and spending in other developed countries. And it has concluded that most of this excess spending comes from providers — from hospitals, pharma companies, doctors, nurses, tech suppliers, and so on…
Recommended, here is the full post.
The New Jersey drone sightings
Here is one short clip. They are almost certainly from humans, but whose humans? These incidents also have some bearing on UAP debates. When the UAPs are from humans, even from an advanced tech program (whether ours or others), it is in fact pretty obvious that “these are a bunch of somebody’s drones.” Update your p’s accordingly. They seem to be tracking some British airbases as well.
Can you trust the mayor of Belleville? A New Jersey state senator agrees.
Wednesday assorted links
1. Semantic search for all CWTs.
2. Job postings for Ph.D economists are down.
4. Krugman’s last NYT column (though he is not retiring more generally he says).
5. Thread on quantum computing. And Scott Aaronson on Google Willow.