Category: Web/Tech
Robot soccer league in China
It is not yet good, but…I nonetheless enjoy it more than regular soccer. Here is another video.
*Breakneck: China’s Quest to Engineer the Future*
Balaji on AI
A few miscellaneous thoughts.
(1) First, the new bottleneck on AI is prompting and verifying. Since AI does tasks middle-to-middle, not end-to-end. So business spend migrates towards the edges of prompting and verifying, even as AI speeds up the middle.
(2) Second, AI really means amplified intelligence, not agentic intelligence. The smarter you are, the smarter the AI is. Better writers are better prompters.
(3) Third, AI doesn’t really take your job, it allows you to do any job. Because it allows you to be a passable UX designer, a decent SFX animator, and so on. But it doesn’t necessarily mean you can do that job *well*, as a specialist is often needed for polish.
(4) Fourth, AI doesn’t take your job, it takes the job of the previous AI. For example: Midjourney took Stable Diffusion’s job. GPT-4 took GPT-3’s job. Once you have a slot in your workflow for AI image gen, AI code gen, or the like, you just allocate that spend to the latest model.
(5) Fifth, killer AI is already here — and it’s called drones. And every country is pursuing it. So it’s not the image generators and chatbots one needs to worry about.
(6) Sixth, decentralized AI is already here and it’s essentially polytheistic AI (many strong models) rather than monotheistic AI (a single all-powerful model). That means balance of power between human/AI fusions rather than a single dominant AI that will turn us all into paperclips/pillars of salt.
(7) Seventh, AI is probabilistic while crypto is deterministic. So crypto can constrain AI. For example, AI can break captchas, but it can’t fake onchain balances. And it can solve some equations, but not cryptographic equations. Thus, crypto is roughly what AI can’t do.
(8) Eighth, I think AI on the whole right now is having a decentralizing effect, because there is so much more a small team can do with the right tooling, and because so many high quality open source models are coming.
All this could change if self-prompting, self-verifying, and self-replicating AI in the physical world really gets going. But there are open research questions between here and there.
Here is the link to the tweet.
The objectivity of Community Notes?
We use crowd-sourced assessments from X’s Community Notes program to examine whether there are partisan differences in the sharing of misleading information. Unlike previous studies, misleadingness here is determined by agreement across a diverse community of platform users, rather than by fact-checkers. We find that 2.3 times more posts by Republicans are flagged as misleading compared to posts by Democrats. These results are not base rate artifacts, as we find no meaningful overrepresentation of Republicans among X users. Our findings provide strong evidence of a partisan asymmetry in misinformation sharing which cannot be attributed to political bias on the part of raters, and indicate that Republicans will be sanctioned more than Democrats even if platforms transition from professional fact-checking to Community Notes.
Here is the full paper. I guess it agrees with Richard Hanania…
My excellent Conversation with Austan Goolsbee
Here is the audio, video, and transcript. Here is part of the episode summary:
A longtime professor at the University of Chicago’s Booth School and former chair of the Council of Economic Advisers under President Obama, Goolsbee now brings that intellectual discipline—and a healthy dose of humor—to his role as president of the Federal Reserve Bank of Chicago.
Tyler and Austan explore what theoretical frameworks Goolsbee uses for understanding inflation, why he’s skeptical of monetary policy rules, whether post-pandemic inflation was mostly from the demand or supply side, the proliferation of stablecoins and shadow banking, housing prices and construction productivity, how microeconomic principles apply to managing a regional Fed bank, whether the structure of the Federal Reserve system should change, AI’s role in banking supervision and economic forecasting, stablecoins and CBDCs, AI’s productivity potential over the coming decades, his secret to beating Ted Cruz in college debates, and more.
Excerpt:
COWEN: Okay, if the instability comes from the velocity side, that means that we should favor a monetary-growth rule to target the growth path of a nominal GDP, M times V, right?
GOOLSBEE: [laughs] Yes, and now you’re going to get me in trouble, Tyler. Here’s the thing I’ve known —
COWEN: You can just say yes. You’re not in trouble with me.
GOOLSBEE: I’m not going to say yes because, remember, I don’t like making policy off accounting identities. There’s no economic content in accounting identity. If you are trying to design a rule, that rule may work if the shocks are the same as what they always were in previous business cycles. I called it the golden path.
When we came into 2023, you’ll recall the Bloomberg economists said there was a 100 percent chance of recession in 2023. They announced it at the end of 2022. That’s when I came into the Fed system, the beginning of ’23.
That argument was rooted in the past. There had never been a drop of inflation of a significant degree without a very serious recession. Yet in 2023, there was. Inflation fell almost as much as it ever fell in one year without a recession. If you over-index too much on a rule that implicitly is premised on that everything is driven by demand shocks, I just think you want to be careful over-committing.
COWEN: I’m a little confused at the theoretical level. On one hand, you’re saying M times V is an identity, but on the other hand, it drives inflation dynamics.
GOOLSBEE: It’s why I started back from the . . . I bring a micro sentiment to the thinking about causality and supply and demand. I sense that you want to bring us to a, let’s agree on a monetary policy rule, and I’m inherently a little uncomfortable. I want to see what the rules say, but I fundamentally don’t want us to pre-commit to any given rule in a way that’s not robust to shocks.
COWEN: Now, you mentioned the post-pandemic inflation and the role of the supply side. When I look at that inflation, I see prices really haven’t come back down. They’ve stayed up, and I see service prices are also quite high and went up a lot, so I tend to think it was mostly demand side. Now, why is that wrong?
GOOLSBEE: There’re two parts to that. I won’t say why it’s wrong, but here are my questions. If you’re firmly a ‘this-all-came-from-demand’ guy, (A) you’ve got to answer, why did inflation begin soaring in the US when the unemployment rate is over 6 percent? Or we could turn it into potential output terms if you want, but output is below our estimate of potential. Unemployment is way higher than what we think of as the natural rate, and inflation is soaring. That already should make you a little questioning.
COWEN: I can cite M2. You may not like it. M2 went up 40 percent over a few-year period, right?
GOOLSBEE: Two, the fact that the inflation is taking place simultaneously in a bunch of countries of similar magnitudes that did not have the kind of aggregate demand, fiscal or monetary stimulus that we had in the US is also a little bit of a puzzle.
Then the third is, if you don’t think it was supply, then you need to have an explanation for why, when the stimulus rolls off, everything about the stimulus is delta from last year. We pass a big fiscal stimulus, we have substantial monetary stimulus that rolls off, the inflation doesn’t come down. Then in ’23, when the supply chain begins to heal, you see inflation come down. Those three things suggest there’s a little bit of a puzzle if you think it was all demand.
COWEN: No, I don’t think it was all demand, but you mentioned other countries. Switzerland and Japan — they import a lot. They were more restrained on the demand side. They had much lower rates of price inflation. That seems to me strong evidence for being more demand than supply.
GOOLSBEE: Wait a minute.
COWEN: I’m waiting.
GOOLSBEE: You’re going to bring in Japan?
COWEN: Yes.
GOOLSBEE: And you’re going to try to claim that Japan’s low inflation is the result of something in COVID? Japan had lower inflation all along, for decades before. They were going through deflation.
COWEN: But if it was mostly supply, a supply shock would’ve gotten them out of the earlier deflation, right? A demand shock would not have.
Recommended.
One possible reason why the skill premium is declining
This is especially true for those jobs that require the rudimentary use of technology. Until relatively recently, many people could get to grips with a computer only by attending a university. Now everyone has a smartphone, meaning non-graduates are adept with tech, too. The consequences are clear. In almost every sector of the economy, educational requirements are becoming less strenuous, according to Indeed, a jobs website. America’s professional-and-business services industry employs more people without a university education than it did 15 years ago, even though there are fewer such people around.
Here is more from The Economist, quite a good piece. Of course this is also a reason why smart phones are underrated.
Are cultural products getting longer?
Ted Gioia argues that cultural products are getting longer:
Some video creators have already figured this out. That’s why the number of videos longer than 20 minutes uploaded on YouTube grew from 1.3 million to 8.5 million in just two years…
Songs are also getting longer. The top ten hits on Billboard actually increased twenty seconds in duration last year. Five top ten hits ran for more than five minutes…
I’ve charted the duration of [Taylor] Swift’s studio albums over the last two decades, and it tells the same story. She has gradually learned that her audience prefers longer musical experiences…
I calculated the average length of the current fiction bestsellers, and they are longer than in any of the previous measurement periods.
Movies are getting longer too. Of course this is the exact opposite of what the “smart phones are ruining our brains” theorists have been telling us. I think I would sooner say that the variance of our attention spans is going up? In any case, here is part of Ted’s theory:
- The dopamine boosts from endlessly scrolling short videos eventually produce anhedonia—the complete absence of enjoyment in an experience supposedly pursued for pleasure. (I write about that here.) So even addicts grow dissatisfied with their addiction.
- More and more people are now rebelling against these manipulative digital interfaces. A sizable portion of the population simply refuses to become addicts. This has always been true with booze and drugs, and it’s now true with digital entertainment.
- Short form clickbait gets digested easily, and spreads quickly. But this doesn’t generate longterm loyalty. Short form is like a meme—spreading easily and then disappearing. Whereas long immersive experiences reach deeper into the hearts and souls of the audience. This creates a much stronger bond than any 15-second video or melody will ever match.
An important piece and useful corrective.
Does AI make us stupider?
That is the topic of my latest Free Press column, responding to a recent study out of MIT. Here is one excerpt:
To see how lopsided their approach is, consider a simple parable. It took me a lot of “cognitive load”—a key measure used in their paper—to memorize all those state capitals in grade school, but I am not convinced it made me smarter or even significantly better informed. I would rather have spent the time reading an intelligent book or solving a math puzzle. Yet those memorizations, according to the standards of this new MIT paper, would qualify as an effective form of cognitive engagement. After all, they probably would have set those electroencephalograms (EEGs)—a test that measures electrical activity in the brain, and a major standard for effective cognition used in the paper—a-buzzin’.
The important concept here is one of comparative advantage, namely, doing what one does best or enjoys the most. Most forms of information technology, including LLMs, allow us to reallocate our mental energies as we prefer. If you use an LLM to diagnose the health of your dog (as my wife and I have done), that frees up time to ponder work and other family matters more productively. It saved us a trip to the vet. Similarly, I look forward to an LLM that does my taxes for me, as it would allow me to do more podcasting.
If you look only at the mental energy saved through LLM use, in the context of an artificially generated and controlled experiment, it will seem we are thinking less and becoming mentally lazy. And that is what the MIT experiment did, because if you are getting some things done more easily your cognitive load is likely to go down.
But you also have to consider, in a real-world context, what we do with all that liberated time and mental energy. This experiment did not even try to measure the mental energy the subjects could redeploy elsewhere; for instance, the time savings they would reap in real-life situations by using LLMs. No wonder they ended up looking like such slackers.
Here is the original study. Here is another good critique of the study.
Modeling errors in AI doom circles
There is a new and excellent post by titotal, here is one excerpt:
The AI 2027 have picked one very narrow slice of the possibility space, and have built up their model based on that. There’s nothing wrong with doing that, as long as you’re very clear that’s what you’re doing. But if you want other people to take you seriously, you need to have the evidence to back up that your narrow slice is the right one. And while they do try and argue for it, I think they have failed, and not managed to prove anything at all.
And:
So, to summarise a few of the problems:
For method 1:
- The AI2027 authors assigned a ~40% probability to a specific “superexponential” curve which is guaranteed to shoot to infinity in a couple of years,even if your current time horizon is in the nanoseconds.
- The report provides very few conceptual arguments in favour of the superexponential curve, one of which they don’t endorse and another of which actually argues against their hypothesis.
- The other ~40% or so probability is given to an “exponential” curve, but this is actually superexponential as well due to the additional “intermediate speedups”.
- Their model for “intermediate speedups”, if backcasted, does not match with their own estimates for current day AI speedups.
- Their median exponential curve parameters do not match with the curve in the METR report and match only loosely with historical data. Their median superexponential curve, once speedups are factored in, has an even worse match with historical data.
- A simple curve with three parameters matches just as well with the historical data, but gives drastically different predictions for future time horizons.
- The AI2027 authors have been presenting a “superexponential” curve to the public that appears to be different to the curve they actually use in their modelling.
There is much more detail (and additional scenarios) at the link. For years now, I have been pushing the line of “AI doom talk needs traditional peer review and formal modeling,” and I view this episode as vindication of that view.
Addendum: Here is a not very good non-response from (some of) the authors.
Who is using AI and how much?
Wet rain a neural classifier to spot AI-generated Python functions in 80 million GitHub commits (2018–2024) by 200,000 developers and track how fast—and where—these tools take hold. By December 2024, AI wrote an estimated 30.1% of Python functions from U.S. contributors, versus 24.3% in Germany, 23.2%in France, 21.6% in India,15.4% in Russia and 11.7% in China. Newer GitHub users use AI more than veterans, while male and female developers adopt at similar rates. Within-developer fixed-effects models show that moving to 30% AI use raises quarterly commits by 2.4%. Coupling this effect with occupational task and wage data puts the annual value of AI-assisted coding in the United States at $9.6–$14.4 billion, rising to $64–$96 billion if we assume higher estimates of productivity effects reported by randomized control trials. Moreover, generative AI prompts learning and innovation, leading to increases in the number of new libraries and library combinations that programmers use. In short, AI usage is already widespread but highly uneven, and the intensity of use, not only access, drives measurable gains in output and exploration.
That is from a new research paper by Simone Daniotti, Johannes Wachs, Xiangnan Feng and Frank Neffke. I am surprised that China does not do better.
English translation of the Morris Chang memoir
This is an unofficial, non-commercial translation of Morris Chang’s memoir, shared for educational and entertainment purposes only. Full disclaimer below.
Here it is, by Karina Bao.
My Conversation with the excellent Any Austin
Here is the audio, video, and transcript. Here is an introduction to Any Austin:
Any Austin has carved a unique niche for himself on YouTube: analyzing seemingly mundane or otherwise overlooked details in video games with the seriousness of an art critic examining Renaissance sculptures. With millions of viewers hanging on his every word about fluvial flows in Breath of the Wild or unemployment rates in the towns of Skyrim, Austin has become what Tyler calls “the very best in the world at the hermeneutics of infrastructure within video games.” But Austin’s deeper mission is teaching us to think analytically about everything we encounter, and to replace gaming culture’s obsession with technical specs and comparative analysis with a deeper aesthetic appreciation that asks simply: what are we looking at, and what does it reveal?
Excerpt:
COWEN: The role in history is important to me. Now AI-generated art would have its own role in history, but it wouldn’t compete directly with Michelangelo. When it comes to movies, I think it’s different because mostly when I’m seeing movies, I’m seeing new movies that don’t yet have a role in history. If the new movie were made in part or fully by the AI, or maybe I’m making it myself, I don’t think I would be any less interested. It’s all artifice anyway.
AUSTIN: There’re two things I take a little issue with there. I don’t take issue with the fact that the role in history is important and beautiful, but the fact that you can watch a movie and get an emotional thing from it without having its role in history implies that there’s some intrinsic, whatever, value to the movie itself, et cetera. Is the implication there that if you didn’t know the role in history of Michelangelo’s David, or whatever, you would look at it and go, “That’s just a guy.” Do you think there’s no intrinsic something to that thing?
COWEN: There’s some, but if I didn’t understand Christianity, Florence, the Renaissance, I think it would lose more than half its value.
AUSTIN: Which artistic mediums is that true for you, and which ones isn’t it? Like music —
COWEN: Abstract music — the role in history is not that important in most cases.
AUSTIN: It’s more of a supplement to you. It makes it more fun to learn about. If you know that Mozart was in the place with these people and were . . . If you understand all of that stuff, it’s fun.
COWEN: That’s 10 percent of the value, but not that much.
AUSTIN: Is it 10 percent . . . Is it the same type of value to you? Or is it just a separate thing to know —
COWEN: Separate thing. With opera, the role in history becomes important again. You hear Don Giovanni. You know about Romanticism, the Enlightenment, Casanova. It all makes much more sense, and it’s funnier.
And this:
COWEN: I have a favorite infrastructure. For me, it would be bridges, ports, and harbors. Do you have a favorite infrastructure?
AUSTIN: Definitely. I’m a big fan of . . . Oh, man, bridges are really good. Bridges, ports, harbors. Roads are good. Actually, no, it’s the stuff we don’t see. Sewage is pretty crazy to me. That we’ve managed to take care of all of that is pretty wild. Energy infrastructure is really fascinating to me.
COWEN: I love wind power turbines.
AUSTIN: Wind power turbines are scary, but I respect your opinion. Nuclear power plants are awesome. Really, really cool.
COWEN: Agreed.
AUSTIN: We should have more. That’s not a policy thing. I think they’re neat. We should build them for the aesthetics, honestly. We should just build those towers. Forget about the —
COWEN: You don’t need the power. Just build the thing. That’s why it’s an artwork.
AUSTIN: Yes, I agree. You have to put in some kind of steam thing because you want to see the steam coming out of it, but just generate steam for no reason. Don’t put any fans in or any spinning turbines or anything. Just have them.
COWEN: We would have historical context like with the sculptures, right?
Definitely recommended, an excellent and very different episode.
And note that Conversations with Tyler now has a dedicated YouTube channel. Subscribe at youtube.com/@CowenConvos.
Trump Administration Launches Probe Into Yale’s Use of Hacked EJMR Data
Christopher Brunet offers his version of the story. While I believe the original research methods were unethical, I very much prefer not to have the federal government involved in this matter.
Are LLMs overconfident? (just like humans)
Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models (LLMs) in a dynamic, adversarial debate setting, uniquely combining two realistic factors: (a) a multi-turn format requiring models to update beliefs as new information emerges, and (b) a zero-sum structure to control for task-related uncertainty, since mutual high-confidence claims imply systematic overconfidence. We organized 60 three-round policy debates among ten state-of-the-art LLMs, with models privately rating their confidence (0-100) in winning after each round. We observed five concerning patterns: (1) Systematic overconfidence: models began debates with average initial confidence of 72.9% vs. a rational 50% baseline. (2) Confidence escalation: rather than reducing confidence as debates progressed, debaters increased their win probabilities, averaging 83% by the final round. (3) Mutual overestimation: in 61.7% of debates, both sides simultaneously claimed >=75% probability of victory, a logical impossibility. (4) Persistent self-debate bias: models debating identical copies increased confidence from 64.1% to 75.2%; even when explicitly informed their chance of winning was exactly 50%, confidence still rose (from 50.0% to 57.1%). (5) Misaligned private reasoning: models’ private scratchpad thoughts sometimes differed from their public confidence ratings, raising concerns about faithfulness of chain-of-thought reasoning. These results suggest LLMs lack the ability to accurately self-assess or update their beliefs in dynamic, multi-turn tasks; a major concern as LLMs are now increasingly deployed without careful review in assistant and agentic roles.
That is by Pradyumna Shyama Prasad and Minh Nhat Nguyen. Here is the associated X thread. Here is my earlier paper with Robin Hanson.
I podcast with Azeem Azhar on the speed of AI take-off
Substack: https://www.exponentialview.co/p/ai-and-growth-tyler-cowens-20-year
X: https://x.com/azeem/status/1930226966139154510
Linkedin: https://www.linkedin.com/posts/azhar_my-advice-to-20-year-olds-navigating-the-activity-7335993878912622592-8c9Z?utm_source=share&utm_medium=member_desktop&rcm=ACoAACj_5X0Bmd-vBkHG0NIIQdYLk_OwGAcChH8
Youtube: https://youtu.be/3Bc_eXNCvlg?si=J7scE8ukZVxLAGXu
Simplecast: https://player.fm/series/azeem-azhars-exponential-view-2447657/tyler-cowen-on-how-ai-will-reorder-economies-schools-and-spirituality