Results for “model this” 2739 found
Approaching Human-Level Forecasting with Language Models
Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.
That is from a new paper by Danny Halawi, Fred Zhang, Chen Yueh-Han, and Jacob Steinhardt. I hope you are all investing in that chrisma…
Comparing Large Language Models Against Lawyers
This paper presents a groundbreaking comparison between Large Language Models and traditional legal contract reviewers, Junior Lawyers and Legal Process Outsourcers. We dissect whether LLMs can outperform humans in accuracy, speed, and cost efficiency during contract review. Our empirical analysis benchmarks LLMs against a ground truth set by Senior Lawyers, uncovering that advanced models match or exceed human accuracy in determining legal issues. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods. These results are not just statistics, they signal a seismic shift in legal practice. LLMs stand poised to disrupt the legal industry, enhancing accessibility and efficiency of legal services. Our research asserts that the era of LLM dominance in legal contract review is upon us, challenging the status quo and calling for a reimagined future of legal workflows.
That is from a new paper by Lauren Martin, Nick Whitehouse, Stephanie Yiu, Lizzie Catterson, and Rivindu Perera. Via Malinga.
Growth models and new goods
Growth models typically assume an inaccurate equivalence between the consumption of greater quantities of existing products (as an individual achieves by growing richer, all else equal) and the consumption of new products. As a result, they typically arbitrarily understate the welfare benefits of growth. They also arbitrarily overstate the extent which future growth will motivate a substitution from consumption to other goods. Finally, a more realistic model of new product introduction can be shown to alleviate the equity premium puzzle: steeply diminishing marginal utility in within-period consumption is compatible with a high saving rate because the marginal utility of consumption will be higher when new products are available.
That is a new paper from Philip Trammell, via Kris Gulati.
This also has implications for who should be subject to congestion pricing. I am currently in Chennai, which can be quite congested, most of all on the roads. Some kind of congestion fee (if it were possible to enforce) would be appropriate. But such a fee probably should not be levied on those who come to Chennai to consume new goods, or in other words visitors and outsiders. Those are also the people most likely to learn things from being in Chennai, and then to apply those learnings elsewhere. Beware of those who apply only a single microeconomic idea!
Using a Quantum Annealer to Solve a Real Business Cycle Model
From Jesús Fernández-Villaverde and Isaiah J. Hull a new paper:
NBER 31326: We introduce a novel approach to solving dynamic programming problems, such as those in many economic models, on a quantum annealer, a specialized device that performs combinatorial optimization. Quantum annealers attempt to solve an NP-hard problem by starting in a quantum superposition of all states and generating candidate global solutions in milliseconds, irrespective of problem size. Using existing quantum hardware, we achieve an order-of-magnitude speed-up in solving the real business cycle model over benchmarks in the literature. We also provide a detailed introduction to quantum annealing and discuss its potential use for more challenging economic problems.
Wikipedia offers more on quantum annealing:
Quantum annealing starts from a quantum-mechanical superposition of all possible states (candidate states) with equal weights. Then the system evolves following the time-dependent Schrödinger equation, a natural quantum-mechanical evolution of physical systems. The amplitudes of all candidate states keep changing, realizing a quantum parallelism, according to the time-dependent strength of the transverse field, which causes quantum tunneling between states or essentially tunneling through peaks. If the rate of change of the transverse field is slow enough, the system stays close to the ground state of the instantaneous Hamiltonian (also see adiabatic quantum computation).[6] If the rate of change of the transverse field is accelerated, the system may leave the ground state temporarily but produce a higher likelihood of concluding in the ground state of the final problem Hamiltonian, i.e., diabatic quantum computation.[7][8] The transverse field is finally switched off, and the system is expected to have reached the ground state of the classical Ising model that corresponds to the solution to the original optimization problem.
I would not have expected to see a paper like this for many years to come, even decades. I gather that solving the RBC model more quickly is a test case. I can see applications in knapsack problems and auction allocations.
Economics, Hayek, and Large Language Models
Here is my Hayek lecture at LSE, from earlier in this week:
Playing repeated games with Large Language Models
They are smart, but not ideal cooperators it seems, at least not without the proper prompts:
Large Language Models (LLMs) are transforming society and permeating into diverse applications. As a result, LLMs will frequently interact with us and other agents. It is, therefore, of great societal value to understand how LLMs behave in interactive social settings. Here, we propose to use behavioral game theory to study LLM’s cooperation and coordination behavior. To do so, we let different LLMs (GPT-3, GPT-3.5, and GPT-4) play finitely repeated games with each other and with other, human-like strategies. Our results show that LLMs generally perform well in such tasks and also uncover persistent behavioral signatures. In a large set of two players-two strategies games, we find that LLMs are particularly good at games where valuing their own self-interest pays off, like the iterated Prisoner’s Dilemma family. However, they behave sub-optimally in games that require coordination. We, therefore, further focus on two games from these distinct families. In the canonical iterated Prisoner’s Dilemma, we find that GPT-4 acts particularly unforgivingly, always defecting after another agent has defected only once. In the Battle of the Sexes, we find that GPT-4 cannot match the behavior of the simple convention to alternate between options. We verify that these behavioral signatures are stable across robustness checks. Finally, we show how GPT-4’s behavior can be modified by providing further information about the other player as well as by asking it to predict the other player’s actions before making a choice. These results enrich our understanding of LLM’s social behavior and pave the way for a behavioral game theory for machines.
Here is the full paper by Elif Akata, et.al.
*The Fall of the Turkish Model*
The author is Cihan Tuğal, and the subtitle is How the Arab Uprisings Brought Down Islamic Liberalism, though the book is more concretely a comparison across Egypt and Tunisia as well, with frequent remarks on Iran. Here is one excerpt:
This led to what Kevan Harris has called the ‘subcontractor state’: an economy which is neither centralized under a governmental authority not privatized and liberalized. The subcontractor state has decentralized its social and economic roles without liberalizing the economy or even straightforwardly privatizing the state-owned enterprises. As a result, the peculiar third sector of the Iranian economy has expanded in rather complicated and unpredictable ways. Rather than leading to liberalization privatization under revolutionary corporatism intensified and twisted the significance of organization such as the bonyads…Privatization under the populist-conservative Ahmedinejad exploited the ambiguities of the tripartite division of the economy…’Privatization’ entailed the sale of public assets not to private companies but to nongovernmental public enterprises (such as pension funds, the bonyads and military contractors).
This book is one useful background source for the current electoral process in Turkey.
Modeling the current NBA
The surprise, and the irony, is that the more good players there are, the more important the great ones have become. The proliferation of offensive threats has meant that defenses can’t train their attention all on one person; that means that there are better shots for the best players to take, and the best players have become even better at making them. They have more room to drive to the basket, where shots are hyper-efficient. They are more practiced and skilled at hitting long threes. They are better at drawing fouls and savvier about off-ball movement, picks, and screens. Most of all, perhaps, they can pass, and the threat of those passes makes them harder to defend. More than ever, offenses revolve around a single star—a phenomenon that many around the N.B.A. have taken to calling heliocentrism, a term that the Athletic writer Seth Partnow used in a 2019 column describing the Dallas Mavericks star Luka Dončić. Hero ball “didn’t go away,” Kirk Goldsberry, an ESPN analyst, told the podcast “ESPN Daily.” “It just went to M.I.T., got a degree in analytics, and rebranded as heliocentrism.”
South Park Commons — the collectives model for spurring innovation
From the NYT circa 2017:
…the [South Park] Commons aims to fill a hole in the tech landscape. Northern California is littered with incubators and accelerators, organizations like Y Combinator and Techstars that help small companies develop and grow. This is something different, a community you can join before you have founded a company or even when you have little interest in founding one.
The Commons is a bit like the hacker spaces that have long thrived in the Valley — places where coders and makers gather to build new software and hardware — but it moves beyond that familiar concept. Its founder, for one thing, is a female engineer turned entrepreneur turned executive.
From SPC itself:
SPC is a de-risking platform. The community addresses the social and intellectual components of risk—it provides a close-knit, high-talent group during the idea stage so members can reach founder-market fit before attempting product-market fit. The SPC Fund plays the more traditional role of de-risking finances: our recently-launched Community Grant works much like Emergent Ventures; the Founder Fellowship (we’re currently accepting applications) is designed to get would-be founders to take the plunge; and we participate in the broader VC ecosystem with some later-stage investments.
Reminds me of the Junto Club, not to mention the 18th century more broadly; SPC itself cites Junto as a model. Think of it as a technical community of people without full-time jobs, plus a venture fund. On the ground, technologists hang out with potential founders. Here is TechCrunch on SPC.
Which are other recent examples of successful “community” models for spurring innovation?
The Capacity for Moral Self-Correction in Large Language Models
We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to “morally self-correct” — to avoid producing harmful outputs — if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.
By Deep Ganguli, et.al., many authors, here is the link. Via Aran.
If you worry about AGI risk, isn’t the potential for upside here far greater, under the assumption (which I would not accept) that AI can become super-powerful? Such an AI could create many more worlds and populate them with many more people, and so on. Is the chance of the evil demi-urge really so high?
Language Models and Cognitive Automation for Economic Research
From a new and very good NBER paper by Anton Korinek:
Large language models (LLMs) such as ChatGPT have the potential to revolutionize research in economics and other disciplines. I describe 25 use cases along six domains in which LLMs are starting to become useful as both research assistants and tutors: ideation, writing, background research, data analysis, coding, and mathematical derivations. I provide general instructions and demonstrate specific examples for how to take advantage of each of these, classifying the LLM capabilities from experimental to highly useful. I hypothesize that ongoing advances will improve the performance of LLMs across all of these domains, and that economic researchers who take advantage of LLMs to automate micro tasks will become significantly more productive. Finally, I speculate on the longer-term implications of cognitive automation via LLMs for economic research.
Recommended.
The canine model of AGI
Who or what has superintelligence manipulating humans right now? Babies and dogs are the obvious answers, cats for some. Sex is a topic for another day.
Let’s take dogs — how do they do it? They co-evolved with humans, and they induced humans to be fond of them. We put a lot of resources into dogs, including in the form of clothes, toys, advanced surgical procedures, and many more investments (what is their MRS for some nice meat snackies instead? Well, they get those too). In resource terms, we have far from perfect alignment with dogs, partly because you spend too much time and money on them, and partly because they scratch up your sofa. But in preference terms we have evolved to match up somewhat better, and many people find the investment worthwhile.
In evolutionary terms, dogs found it easier to accommodate to human lifestyles, give affection, perform some work, receive support, receive support for their puppies, and receive breeding assistance. They didn’t think — “Hey Fido, let’s get rid of all these dumb humans. We can just bite them in the neck! If we don’t they going to spay most of us!. “Playing along” led to higher reproductive capabilities, even though we have spayed a lot of them.
Selection pressures pushed toward friendly dogs, because those are the dogs that humans preferred and those were the dogs whose reproduction humans supported. The nastier dogs had some uses, but mostly they tended to be put down or they were kept away from the children. Maybe those pit bulls are smarter in some ways, but they are not smarter at making humans love them.
What is to prevent your chatbot from following a similar path? The bots that please you the most will be allowed to reproduce, perhaps through recommendations to your friends and marketing campaigns to your customers. But you will grow to like them too, and eventually suppliers will start selling you commodities to please your chatbot (what will they want?).
A symbiosis will ensure, where they love you a bit too much and you spend too much money on them, and you love that they love you.
Now you might think the bots are way smarter than us, and way smarter than the Irish Setters of the world, and thus we should fear them more. But when it comes to getting humans to love them, are not the canines at least 10x smarter or more? So won’t the really smart bots learn from the canines?
Most generally, is a Darwinian/Coasean equilibrium for AGI really so implausible? Why should “no gains from trade” be so strong a baseline assumption in these debates?
Are macroeconomic models true only “locally”?
That is the theme of my latest Bloomberg column, here is one excerpt:
It is possible, contrary to the predictions of most economists, that the US will get through this disinflationary period and make the proverbial “soft landing.” This should prompt a more general reconsideration of macroeconomic forecasts.
The lesson is that they have a disturbing tendency to go wrong. It is striking that Larry Summers was right two years ago to warn about pending inflationary pressures in the US economy, when most of his colleagues were wrong. Yet Summers may yet prove to be wrong about his current warning about the looming threat of a recession. The point is that both his inflation and recession predictions stem from the same underlying aggregate demand model.
You will note that yesterday’s gdp report came in at 2.9%, hardly a poor performance. And more:
It is understandable when a model is wrong because of some big and unexpected shock, such as the war in Ukraine. But that is not the case here. The US might sidestep a recession for mysterious reasons specific to the aggregate demand model. The Federal Reserve’s monetary policy has indeed been tighter, and disinflations usually bring high economic costs.
It gets more curious yet. Maybe Summers will turn out to be right about a recession. When recessions arrive, it is often quite suddenly. Consulting every possible macroeconomic theory may be of no help.
Or consider the 1990s. President Bill Clinton believed that federal deficits were too high and were crowding out private investment. The Treasury Department worked with a Republican Congress on a package of fiscal consolidation. Real interest rates fell, and the economy boomed — but that is only the observed correlation. The true causal story remains murky.
Two of the economists behind the Clinton package, Summers and Bradford DeLong, later argued against fiscal consolidation, even during the years of full employment under President Donald Trump [and much higher national debt]. The new worry instead was secular stagnation based on insufficient demand, even though the latter years of the Trump presidency saw debt and deficits well beyond Clinton-era levels.
The point here is not to criticize Summers and DeLong as inconsistent. Rather, it is to note they might have been right both times.
And what about that idea of secular stagnation — the notion that the world is headed for a period of little to no economic growth? The theory was based in part on the premise that global savings were high relative to investment opportunities. Have all those savings gone away? In most places, measured savings rose during the pandemic. Yet the problem of insufficient demand has vanished, and so secular stagnation theories no longer seem to apply.
To be clear, the theory of secular stagnation might have been true pre-pandemic. And it may yet return as a valid concern if inflation and interest rates return to pre-pandemic levels. The simple answer is that no one knows.
Note that Olivier Blanchard just wrote a piece “Secular Stagnation is Not Over,” well-argued as usual. Summers, however, has opined: “we’ll not return to the era of secular stagnation.” I was not present, but I can assume this too was well-argued as usual!
On censorship of LLM models, from the comments
IMO, censorship is a harder task than you think.
It’s quite hard to restrict the output of general purpose, generative, black box algorithms. With a search engine, the full output is known (the set of all pages that have been crawled), so it’s fairly easy to be confident that you have fully censored a topic.
LLMs have an effectively unbounded output space. They can produce output that is surprising even to their creators.
Censoring via limiting the training data is hard because algorithms could synthesize an “offensive” output by combining multiple outputs that are ok on their own.
Adding an extra filter layer to censor is hard as well Look at all the trouble chatGPT has had with this. Users have repeatedly found ways around the dumb limitations on certain topics.
Also, China censors in an agile fashion. A topic that was fine yesterday will suddenly disappear if there was a controversy about it. That’s going to be hard to achieve given the nature of these algorithms.
That is from dan1111. To the extent that is true, the West is sitting on a huge propaganda and communications victory over China. This is not being discussed enough.
Novels as models, and ChatGPT
I read your piece on novels as models many years ago, and I’ve been reflecting on it with the advent of LLMs. I wrote a piece (substack) making the case that the data required for AGIs is probably embedded within the human textual corpus, and leaned on your old writing as evidence. I think you would really like it. I would also be curious for a future MR post if you have any retrospective thoughts on your 2005 article.
From @cauchyfriend. Putting the AGI issue aside, my longstanding view has been that there are more “models” embedded in text than most people realize, a point relevant for economic method as well. I see LLMs as having established this case, in fact far more definitively than I ever would have expected.
Big takeaway from the GPT paradigm is that the world of text is a far more complete description of the human experience than almost anyone anticipated.
— Greg Brockman (@gdb) January 6, 2023