Category: Web/Tech
What can LLMs never do?
By Rohit Krishnan, he and I are both interested in the question of what LLMs cannot do, and why. Here is one excerpt:
It might be best to say that LLMs demonstrate incredible intuition but limited intelligence. It can answer almost any question that can be answered in one intuitive pass. And given sufficient training data and enough iterations, it can work up to a facsimile of reasoned intelligence.
The fact that adding an RNN type linkage seems to make a little difference though by no means enough to overcome the problem, at least in the toy models, is an indication in this direction. But it’s not enough to solve the problem.
In other words, there’s a “goal drift” where as more steps are added the overall system starts doing the wrong things. As contexts increase, even given previous history of conversations, LLMs have difficulty figuring out where to focus and what the goal actually is. Attention isn’t precise enough for many problems.
A closer answer here is that neural networks can learn all sorts of irregular patterns once you add an external memory.
And:
In LLMs as in humans, context is that which is scarce.
Interesting throughout.
“The Simple Macroeconomics of AI”
That is the new Daron Acemoglu paper, and he is skeptical about its overall economic effects. Here is part of the abstract:
Using existing estimates on exposure to AI and productivity improvements at the task level, these macroeconomic effects appear nontrivial but modest—no more than a 0.71% increase in total factor productivity over 10 years. The paper then argues that even these estimates could be exaggerated, because early evidence is from easy-to-learn tasks, whereas some of the future effects will come from hard-to-learn tasks, where there are many context-dependent factors affecting decision-making and no objective outcome measures from which to learn successful performance. Consequently, predicted TFP gains over the next 10 years are even more modest and are predicted to be less than 0.55%.
Note he is not suggesting TFP (total factor productivity, a measure of innovation) will go up by 0.71 percentage points (a plausible estimate, in my view), he is saying it will go up 0.71% over a ten year period, or by 0.07 annually. Here is the explanation of method:
I show that when AI’s microeconomic effects are driven by cost savings (equivalently, productivity improvements) at the task level—due to either automation or task complementarities—its macroeconomic consequences will be given by a version of Hulten’s theorem: GDP and aggregate productivity gains can be estimated by what fraction of tasks are impacted and average task-level cost savings. This equation disciplines any GDP and productivity effects from AI. Despite its simplicity, applying this equation is far from trivial, because there is huge uncertainty about which tasks will be automated or complemented, and what the cost savings will be.
Mostly I think this piece is wrong, and I think it is wrong for reasons of economics. It is not that I think the estimate is off, I think the method is misleading altogether.
As with international trade, a lot of the benefits of AI will come from getting rid of the least productive firms from within the distribution. This factor is never considered.
And as with international trade, a lot of the benefits of AI will come from “new goods,” Since the prices of those new goods previously were infinity (do note the degree of substability matters), those gains can be much higher than what we get from incremental productivity improvements. The very popular Character.ai is already one such new good, not to mention I and many others enjoy playing around with LLMs just about every day.
By the way, the core model of this paper — see pp.6-7 — postulates only a single good for the economy. Mention of the contrary case does surface on p.11, and starting with p.19, where most of the attention is devoted to bad new goods, such as more effective manipulation of consumers. Note the paper doesn’t have any empirical argument as to why most new AI goods might be bad for social welfare.
pp.34-35 focus on the possibility of a public goods problem for AI use, similar to what has been suggested for social media. That discussion seems very far from both current practices with AI and most of the speculation from AI experts. Do I have to use Midjourney because all of my friends do, and I wish the whole thing didn’t exist? Or rather do I simply find it to be great fun, as do many people when they create their own songs with AI? It is dubious to play up the prisoner’s dilemma effects so much, but Acemoglu returns to this point with much force in the conclusion.
Toward the end he writes:
Productivity improvements from new tasks are not incorporated into my estimates. This is for three reasons. First and most parochially, this is much harder to measure and is not included in the types of exposure considered in Eloundou et al. (2023) and Svanberg et al. (2024). Second, and more importantly, I believe it is right not to include these in the likely macroeconomic effects, because these are not the areas receiving attention from the industry at the moment, as also argued in Acemoglu (2021), Acemoglu and Restrepo (2020b) and Acemoglu and Johnson (2023). Rather, areas of priority for the tech industry appear to be around automation and online monetization, such as through search or social media digital ads. Third, and relatedly, more beneficial outcomes may require new institutions, policies and regulations, as also suggested in Acemoglu and Johnson (2023) and Acemoglu et al. (2023).
While many of the points in that paragraph seem outright wrong to me (such as the industry attention point), what he can’t bring himself to say is that the gains from such new tasks will in fact be small. Because they won’t be. But whether or not you agree, what is going on in the paper is that the gains from AI measure as small because it is assumed AI will not be doing new things. I just don’t see why it is worth doing such an exercise.
A more general question is whether this model can predict that TFP moves around as much as it does. I am pretty sure the answer there is “no,” not anywhere close to that.
On the general approach, I found this sentence (p.4) very odd: “…my framework also clarifies that what is relevant for consumer welfare is TFP, rather than GDP, since the additional investment comes out of consumption.” I would say what is relevant for consumer welfare is the sum of consumer and producer surpluses, of which TFP is not a sufficient statistic. This unusual “redefinition of all welfare economics in a single sentence” perhaps follows from how many other gains from trade he has abolished from the system? And footnote six is odd and also wrong: “For example, if AI models continue to increase their energy requirements, this would contribute to measured GDP, but would not be a beneficial change for welfare.” Even for dirty energy that might be wrong, not to mention for green energy. If an innovation induces the market to invest more in a service, the costs of that added investment simply do not scuttle the gains altogether. And if Acemoglu wants to argue that weird welfare economics is true in his model, that is a good argument against his model, not a good argument that such gains would not count in the real world, which is what this paper is supposed to be about.
Acemoglu explicitly rules out gains from doing better science, as they may not come within the ten-year time frame. On that one, he is the prisoner of his own assumptions. If many gains come in say years 10-15, I would just say the paper is misleading, even if his words are defensible in the purely literal sense.
That said, just how much does the “no new science” clause rule out? In terms of an economic model, how does “new science” differ from “TFP”? I am not sure, not are we given clear guidance. Is better software engineering “new science”? Maybe so? Won’t we get a lot of that within ten years? Don’t we have some of it already?
In sum, I don’t think this paper at all establishes the “small gains point” it is trying to promote in the abstract.
It is perfectly fair to point out that the optimists have not shown large gains, but in this paper the deck is entirely — and unfairly — stacked in the opposite direction.
For the pointer I thank Gabriel.
Dwarkesh interviews Mark Zuckerberg on Llama 3 and more
Claims about claims (it’s happening)
The craziest LLaMA 3 reveal:
The 400B+ version of the model is **on par with Claude 3 Opus**, and it's still training.
Soon, we'll have a better-than-Opus, fully open-source model.
The implications are huge. pic.twitter.com/chaDp7jaCa
— Matt Shumer (@mattshumer_) April 18, 2024
Here is from Meta.
Do protests matter?
Only rarely:
Recent social movements stand out by their spontaneous nature and lack of stable leadership, raising doubts on their ability to generate political change. This article provides systematic evidence on the effects of protests on public opinion and political attitudes. Drawing on a database covering the quasi-universe of protests held in the United States, we identify 14 social movements that took place from 2017 to 2022, covering topics related to environmental protection, gender equality, gun control, immigration, national and international politics, and racial issues. We use Twitter data, Google search volumes, and high-frequency surveys to track the evolution of online interest, policy views, and vote intentions before and after the outset of each movement. Combining national-level event studies with difference-in-differences designs exploiting variation in local protest intensity, we find that protests generate substantial internet activity but have limited effects on political attitudes. Except for the Black Lives Matter protests following the death of George Floyd, which shifted views on racial discrimination and increased votes for the Democrats, we estimate precise null effects of protests on public opinion and electoral behavior.
That is from a new NBER working paper by Amory Gethin and Vincent Pons.
My excellent Conversation with Peter Thiel
Here is the audio, video, and transcript, along with almost thirty minutes of audience questions, filmed in Miami. Here is the episode summary:
Tyler and Peter Thiel dive deep into the complexities of political theology, including why it’s a concept we still need today, why Peter’s against Calvinism (and rationalism), whether the Old Testament should lead us to be woke, why Carl Schmitt is enjoying a resurgence, whether we’re entering a new age of millenarian thought, the one existential risk Peter thinks we’re overlooking, why everyone just muddling through leads to disaster, the role of the katechon, the political vision in Shakespeare, how AI will affect the influence of wordcels, Straussian messages in the Bible, what worries Peter about Miami, and more.
Here is an excerpt:
COWEN: Let’s say you’re trying to track the probability that the Western world and its allies somehow muddle through, and just keep on muddling through. What variable or variables do you look at to try to track or estimate that? What do you watch?
THIEL: Well, I don’t think it’s a really empirical question. If you could convince me that it was empirical, and you’d say, “These are the variables we should pay attention to” — if I agreed with that frame, you’ve already won half the argument. It’d be like variables . . . Well, the sun has risen and set every day, so it’ll probably keep doing that, so we shouldn’t worry. Or the planet has always muddled through, so Greta’s wrong, and we shouldn’t really pay attention to her. I’m sympathetic to not paying attention to her, but I don’t think this is a great argument.
Of course, if we think about the globalization project of the post–Cold War period where, in some sense, globalization just happens, there’s going to be more movement of goods and people and ideas and money, and we’re going to become this more peaceful, better-integrated world. You don’t need to sweat the details. We’re just going to muddle through.
Then, in my telling, there were a lot of things around that story that went very haywire. One simple version is, the US-China thing hasn’t quite worked the way Fukuyama and all these people envisioned it back in 1989. I think one could have figured this out much earlier if we had not been told, “You’re just going to muddle through.” The alarm bells would’ve gone off much sooner.
Maybe globalization is leading towards a neoliberal paradise. Maybe it’s leading to the totalitarian state of the Antichrist. Let’s say it’s not a very empirical argument, but if someone like you didn’t ask questions about muddling through, I’d be so much — like an optimistic boomer libertarian like you stop asking questions about muddling through, I’d be so much more assured, so much more hopeful.
COWEN: Are you saying it’s ultimately a metaphysical question rather than an empirical question?
THIEL: I don’t think it’s metaphysical, but it’s somewhat analytic.
COWEN: And moral, even. You’re laying down some duty by talking about muddling through.
THIEL: Well, it does tie into all these bigger questions. I don’t think that if we had a one-world state, this would automatically be for the best. I’m not sure that if we do a classical liberal or libertarian intuition on this, it would be maybe the absolute power that a one-world state would corrupt absolutely. I don’t think the libertarians were critical enough of it the last 20 or 30 years, so there was some way they didn’t believe their own theories. They didn’t connect things enough. I don’t know if I’d say that’s a moral failure, but there was some failure of the imagination.
COWEN: This multi-pronged skepticism about muddling through — would you say that’s your actual real political theology if we got into the bottom of this now?
THIEL: Whenever people think you can just muddle through, you’re probably set up for some kind of disaster. That’s fair. It’s not as positive as an agenda, but I always think . . .
One of my chapters in the Zero to One book was, “You are not a lottery ticket.” The basic advice is, if you’re an investor and you can just think, “Okay, I’m just muddling through as an investor here. I have no idea what to invest in. There are all these people. I can’t pay attention to any of them. I’m just going to write checks to everyone, make them go away. I’m just going to set up a desk somewhere here on South Beach, and I’m going to give a check to everyone who comes up to the desk, or not everybody. It’s just some writing lottery tickets.”
That’s just a formula for losing all your money. The place where I react so violently to the muddling through — again, we’re just not thinking. It can be Calvinist. It can be rationalist. It’s anti-intellectual. It’s not thinking about things.
Interesting throughout, definitely recommended. You may recall that the very first CWT episode (2015!) was with Peter, that is here.
GPT-4-Turbo still doesn’t answer this question well
“Name three famous people who all share the exact same birth date and year.”
Usually it fails, the most common failure being it names someone with the correct date but the incorrect year. Telling it to “reason step by step” is no panacea either. And if you want to make it harder, ask for more than three people, and if need be you can decrease the required degree of fame, so it is not a stumper per se.
Why does GPT repeatedly fail in this manner? Do you have a theory with microfoundations rooted in an understanding of how autoregression works? Inquiring minds wish to know.
Voxsplainer on smart phones and teen mental health
A very good piece, by Eric Levitz, note Vox is not renowned for defending Big Tech.
TimeGPT-1
In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.
That is from a new paper by Azul Garza and Max Mergenthaler-Canseco. A few of you may be needing a new job soon!
A simple model of AI and social media
One MR reader, Luca Piron, writes to me:
I found myself puzzled by a thought you expressed during your interview with Professor Haidt. In particular, from my understanding you suggested that in the near future AI will be able to sum up the content a user may want to see into a digest, so that they can spend less time using their devices.
I think that is a misunderstanding of how the typical user experiences social media. While there surely are some brilliant people such as the young scientists you described during the episode who use social media only to connect with peers and find valuable information, I would argue that most users, alas including myself, turn to social media when seeking mindless distraction, when bored or maybe too tired to read of watch a film. Therefore, having a digest will prove unsatisfactory. What a typical user wants is the stream of content to continue.
I think these are some of the least understood points of 2024. Let us start with the substitution effect. The “digest” feature of AI will soon let you turn your feeds into summaries and pointers to the important parts. In other words, you will be able to consume those feeds more quickly. In some cases the quality of the feed experience may go up, in other cases it may go down (presumably over time quality of the digest will improve).
We all know that if tech allows you to cook more quickly (e.g., microwave ovens), you will spend less time cooking. That is true even if you are “addicted” to cooking, if you cook because of social pressures, if cooking puts you into a daze, or whatever. The substitution effect still applies, noting that in some cases the new tech may make the cooked food better, in other cases worse. In similar fashion, you will spend less time with your feed, following the advent of AI feed digests.
Somehow people do not want to acknowledge the price theory aspect of the problem, as they are content to repeat the motives of young people in spending time with their feeds. (You will note there is the possibility of a broader portfolio effect — AI might liberate you from many tasks, and you could end up spending more time with your feed. I’ll just say don’t bet against the substitution effect, it almost always dominates! And yes for addictive goods too. In fact those demand curves usually don’t look any different.) No one has to be a young genius scientist for the substitution effect to hold.
Note that a majority of U.S. teens report they spend about the right amount of time on social media apps (8% say “too little time”) and they are going to respond to technological changes with pretty normal kinds of behavior.
I think what has in fact happened is that commentators have read dozens of MSM articles about “algorithms,” and mostly are not following very recent tech developments, including in the consumer AI field. Perhaps that is why they have difficult processing what is a simple, straightforward argument, based on a first-order effect.
Another general way of putting the point, not as simple as a demand curve but still pretty straightforward, is that if tech creates a social problem, other forms of tech will be innovated and mobilized to help address that problem. Again, that is not a framing you get very often from MSM.
The AI example is also a forcing one when it comes to motives for spending time with social media feeds. Many critics wish to have it both ways. They want to say “the feed is no fun, teenagers stick with the feed because of social pressures to be in touch with others, but they ideally would rather do something else.” But when a new technology allows them to secede from feed obsession to some degree, (some of) those same critics say: “They can’t/won’t secede — they are addicted!” The word “dopamine” is then likely to follow, though rarely the word “fun.”
It is better to just start by admitting that the feed is fun, and informative, for many teenagers and adults too. Of course not everything fun is good for you, but the “social pressure” verbal gambit is a slight of hand to make social media sound like an obvious bad across all margins, and a network that needs to be taken down, rather than something we ought to help people manage better, at the margin. If it really were mainly a social pressure problem, it would be relatively easy to solve.
For many teens, both motives operate, namely scrolling the feed is fun, and there are social pressures to stay informed. The advent of the AI digest will allow those same individuals to cut back on the social pressure obligations, but keep the fun scrolling. Again, a substitution effect will operate, and furthermore it will nudge individuals away from the harmful social pressures and closer to the fun.
As Katherine Boyle pointed out on Twitter, a lot of this debate is being conducted in terms of 2016 technology. But in fact we are in 2024, not far from the summer of 2024, and soon to enter 2025. Beware of regulatory proposals, and social welfare analyses, that do not acknowledge that fact.
In the meantime, please do heed the substitution effect.
Algorithmic Collusion by Large Language Models
The rise of algorithmic pricing raises concerns of algorithmic collusion. We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs), and specifically GPT-4. We find that (1) LLM-based agents are adept at pricing tasks, (2) LLM-based pricing agents autonomously collude in oligopoly settings to the detriment of consumers, and (3) variation in seemingly innocuous phrases in LLM instructions (“prompts”) may increase collusion. These results extend to auction settings. Our findings underscore the need for antitrust regulation regarding algorithmic pricing, and uncover regulatory challenges unique to LLM-based pricing agents.
That is a new paper by Sara Fish, Yannai A. Gonczarowski, and Ran I. Shorrer. The authors are running too quickly into their policy conclusion there (how about removing legal barriers to free entry in many cases? not worth a mention?), but nonetheless very interesting work. Via Ethan Mollick.
Generative AI for economists
From Anton Korinek here is a recent paper:
Generative AI, in particular large language models (LLMs) such as ChatGPT, has the potential to revolutionize research. I describe dozens of use cases along six domains in which LLMs are starting to become useful as both research assistants and tutors: ideation and feedback, writing, background research, data analysis, coding, and mathematical derivations. I provide general instructions and demonstrate specific examples of how to take advantage of each of these, classifying the LLM capabilities from experimental to highly useful. I argue that economists can reap significant productivity gains by taking advantage of generative AI to automate micro tasks. Moreover, these gains will grow as the performance of AI systems across all of these domains will continue to improve. I also speculate on the longer-term implications of AI-powered cognitive automation for economic research. The online resources associated with this paper offer instructions for how to get started and will provide regular updates on the latest capabilities of generative AI that are useful for economists.
Here is the home page for Korinek. Here is related applied work from Benjamin Manning. Economic research methods are changing right before our eyes, and most of the profession is asleep on this one.
LLMs vs. ARMA-GARCH
The LLMs basically win:
This paper presents a novel study on harnessing Large Language Models’ (LLMs) outstanding knowledge and reasoning abilities for explainable financial time series forecasting. The application of machine learning models to financial time series comes with several challenges, including the difficulty in cross-sequence reasoning and inference, the hurdle of incorporating multi-modal signals from historical news, financial knowledge graphs, etc., and the issue of interpreting and explaining the model results. In this paper, we focus on NASDAQ-100 stocks, making use of publicly accessible historical stock price data, company metadata, and historical economic/financial news. We conduct experiments to illustrate the potential of LLMs in offering a unified solution to the aforementioned challenges. Our experiments include trying zero-shot/fewshot inference with GPT-4 and instruction-based fine-tuning with a public LLM model Open LLaMA. We demonstrate our approach outperforms a few baselines, including the widely applied classic ARMA-GARCH model and a gradient-boosting tree model. Through the performance comparison results and a few examples, we find LLMs can make a well-thought decision by reasoning over information from both textual news and price time series and extracting insights, leveraging cross-sequence information, and utilizing the inherent knowledge embedded within the LLM. Additionally, we show that a publicly available LLM such as Open-LLaMA, after fine-tuning, can comprehend the instruction to generate explainable forecasts and achieve reasonable performance, albeit relatively inferior in comparison to GPT-4.
This kind of work is in its infancy of course. Nonetheless these are intriguing results, here is the paper. Via an MR reader.
My contentious Conversation with Jonathan Haidt
Here is the transcript, audio, and video. Here is the episode summary:
But might technological advances and good old human resilience allow kids to adapt more easily than he thinks?
Jonathan joined Tyler to discuss this question and more, including whether left-wingers or right-wingers make for better parents, the wisest person Jonathan has interacted with, psychological traits as a source of identitarianism, whether AI will solve the screen time problem, why school closures didn’t seem to affect the well-being of young people, whether the mood shift since 2012 is not just about social media use, the benefits of the broader internet vs. social media, the four norms to solve the biggest collective action problems with smartphone use, the feasibility of age-gating social media, and more.
It is a very different tone than most CWTs, most of all when we get to social media. Here is one excerpt:
COWEN: There are two pieces of evidence — when I look at them, they don’t seem to support your story out of sample.
HAIDT: Okay, great. Let’s have it.
COWEN: First, across countries, it’s mostly the Anglosphere and the Nordic countries, which are more or less part of the Anglosphere. Most of the world is immune to this, and smartphones for them seem fine. Why isn’t it just that a negative mood came upon the Anglosphere for reasons we mostly don’t understand, and it didn’t come upon most of the rest of the world? If we’re differentiating my hypothesis from yours, doesn’t that favor my view?
HAIDT: Well, once you look into the connections and the timing, I would say no. I think I see what you’re saying now, but I think your view would say, “Just for some reason we don’t know, things changed around 2012.” Whereas I’m going to say, “Okay, things changed around 2012 in all these countries. We see it in the mental illness rates, especially of the girls.” I’m going to say it’s not just some mood thing. It’s like (a), why is it especially the girls? (b) —
COWEN: They’re more mimetic, right?
HAIDT: Yes, that’s true.
COWEN: Girls are more mimetic in general.
HAIDT: That’s right. That’s part of it. You’re right, that’s part of it. They’re just much more open to connection. They’re more influenced. They’re more subject to contagion. That is a big part of it, you’re right. What Zach Rausch and I have found — he’s my lead researcher at the After Babel Substack. I hope people will sign up. It’s free. We’ve been putting out tons of research. Zach has really tracked down what happened internationally, and I can lay it out.
Now I know the answer. I didn’t know it two months ago. The answer is, within countries, as I said, it’s the people who are conservative and religious who are protected, and the others, the kids get washed out to sea. Psychologically, they feel their life has no meaning. They get more depressed. Zach has looked across countries, and what you find in Europe is that, overall, the kids are getting a little worse off psychologically.
But that hides the fact that in Eastern Europe, which is getting more religious, the kids are actually healthier now than they were 10 years ago, 15 years ago. Whereas in Catholic Europe, they’re a little worse, and in Protestant Europe, they’re much worse.
It doesn’t seem to me like, oh, New Zealand and Iceland were talking to each other, and the kids were sharing memes. It’s rather, everyone in the developed world, even in Eastern Europe, everyone — their kids are on phones, but the penetration, the intensity, was faster in the richest countries, the Anglos and the Scandinavians. That’s where people had the most independence and individualism, which was pretty conducive to happiness before the smartphone. But it now meant that these are the kids who get washed away when you get that rapid conversion to the phone-based childhood around 2012. What’s wrong with that explanation?
COWEN: Old Americans also seem grumpier to me. Maybe that’s cable TV, but it’s not that they’re on their phones all the time. And you know all these studies. If you try to assess what percentage of the variation in happiness of young people is caused by smartphone usage — Sabine Hossenfelder had a recent video on this — those numbers are very, very, very small. That’s another measurement that seems to discriminate in favor of my theory, exogenous mood shifts, rather than your theory. Why not?
Very interesting throughout, recommended. And do not forget that Jon’s argument is outlined in detail in his new book, titled The Anxious Generation: How the Great Rewiring of Childhood is Causing an Epidemic of Mental Illness.
It’s happening, music and video edition
OpenAI released a Music Video with Sora
it's puzzling i dont think ive ever seen anything quite like this before pic.twitter.com/8OzUVuQmFL— nano (@nanulled) April 2, 2024