Results for “Tests”
772 found

Behavioral Economics and GPT-4: From William Shakespeare to Elena Ferrante

There is a new paper on LLMs by Gabriel Abrams, here is the abstract:

We prompted GPT-4 (a large language model) to play the Dictator game, a classic behavioral economics experiment, as 148 literary fictional characters from the 17th century to the 21st century. 

Of literary interest, this paper analyzed character selfishness by century, the relative frequency of literary character personality traits, and the average valence of these traits. The paper also analyzed character gender differences in selfishness.

From an economics/AI perspective, this paper generates specific and quantifiable Turing tests which the model passed for zero price effect, lack of spitefulness and altruism, and failed for human sensitivity to relative ordinal position and price elasticity (elasticity is significantly lower than humans). Model updates from March to August 2023 had relatively minor impacts on Turing test outcomes.

There is a general and mainly monotonic decrease in selfish behavior over time in literary characters. 50% of the decisions of characters from the 17th century are selfish compared to just 19% of the decisions of characters from the 21st century. Overall, humans exhibited much more selfish behavior than AI characters, with 51% of human decisions being selfish compared to 32% of decisions made by AI characters.

Historical literary characters have a surprisingly strong net positive valence across 2,785 personality traits generated by GPT-4 (3.2X more positive than negative). However, valence varied significantly across centuries. The most positive century, in terms of personality traits, was the 21st — over 10X the ratio of positive to negative traits. The least positive century was the 17th at just 1.8X. “Empathetic,” “fair” and “selfless,” were the most overweight traits in the 20th century. Conversely, “manipulative,” “ambitious” and “ruthless” were the most overweight traits in the 17th century.

Male characters were more selfish than female characters: 35% of male decisions were selfish compared to just 24% for female characters. The skew was highest in the 17th century where selfish decisions for male and female were 62% and 20% respectively.

This analysis offers a specific and quantifiable partial Turing test. In a few ways, the model is remarkably human-like; The key human-like characteristics are the zero price effect, lack of spitefulness and altruism. However, in other ways, GPT-4 reflects unusual or inhuman preferences. The model does not appear to have human sensitivity to relative ordinal position and has significantly lower price elasticity than humans.

Model updates in GPT-4 have made it slightly more sensitive to ordinal value, but not more selfish. The model shows preference consistency across model runs for each character with respect to selfishness.

To which journal might you advise him to send this paper?

How Credible is the Credibility Revolution?

When economists analyze a well-conducted RCT or natural experiment and find a statistically significant effect, they conclude the null of no effect is unlikely to be true. But how frequently is this conclusion warranted? The answer depends on the proportion of tested nulls that are true and the power of the tests. I model the distribution of t-statistics in leading economics journals. Using my preferred model, 65% of narrowly rejected null hypotheses and 41% of all rejected null hypotheses with |t|<10 are likely to be false rejections. For the null to have only a .05 probability of being true requires a t of 5.48.

That is from a new NBER working paper by Kevin Lang.

Saturday assorted links

1. #lazygirljob (WSJ).

2. The economic foundations of Kenyan street protests.  And some interest groups are rebelling against Nigerian reforms.

3. Product liability for defective AI.  A law and economics study.

4. Diagnostic medical errors are a huge problem.

5. Lebanese man holds up bank to get his own money out (NYT).

6. More Okie-dokie about fabricated research.

7. Good Tim Lee primer on how LLMs work.

Sri Lanka travel notes

Have you ever been to a perfect spot and wished “there should be an amazing hotel right there, except I want the hotel without any accompanying crowding or corruptions of tourism?”

If that is your desire, Sri Lanka is the country for you.  (Who cares if that is an apparent violation of the laws of economics and location theory?)  If you have visited Sri Lanka, likely you will know what I mean — it is simply so nice.  So much the right blend of exotic and comfortable.  It feels so unspoilt, so fresh, and so natural.  So easy on the visitor, as if it were a well-run state of India with a big dose of Buddhism and a vaguely Caribbean vibe, and without the extreme population density.

Here was my Kandalama hotel, let the link rotate through all the images.

Here is my current hotel, only about 30 or 40 feet away from one of the world’s major Buddhist temple complexes, medieval and mostly dating from the 12th century.  None of it is close to expensive, no matter how high the quality.

Galle, on the southern coast, is a lovely colonial city, largely intact, with notable Dutch, Portuguese, and British buildings, as well as mosques.  It is ringed by an old fort, and has numerous good views of the ocean.  Everything is walkable.  Russians are the single largest tourist group there (no visa required), and yet the town does not feel overwhelmed, even on the cusp of August.  Try Galle Fort Hotel, which is also a UNESCO heritage site.

There is an aesthetic look to so many things.  If a farmer builds a tree house so he can monitor his crops at night without being stomped by elephants, the tree house will be pretty nice, even though the farmer is poor.

So much of Sri Lanka feels like the 1980s, in a way that is good for you but not good for them.  On the plus side, education, literacy (92%), and social indicators are high.  Life expectancy is higher than in the United States.  You can drive around deep into the rural areas, and you just won’t see extreme poverty.  Nor are the drivers crazy, so the travel isn’t stressful.  The country never seems internet-obsessed.

The total fertility rate is currently about 2.0, a blessing in the short run but likely a disaster over time.  At about 4k per capita income (about 14k PPP), Sri Lanka cannot afford to grow old before it becomes wealthy.  And I can assure you, it is not on the verge of becoming wealthy.

The important buildings — and there are many of them — are all of earlier vintages.  The hotels of Geoffrey Bawa — in a style sometimes called “tropical modernism” — are of special interest.  Simply tracking down all of the Bawa hotels would be a good way of organizing your trip.  Sri Lanka is one of the few countries in the world where the very nice old architecture and the very nice newer architecture bear some aesthetic relation to each other.

Did I mention this?:

In 2022, with its GDP contracting by 7.8%, Sri Lanka was one of the worst performing economies in the world. Annual inflation was around 60%, and the currency depreciated by over 80%. These quantifiable measures of pain were exacerbated by severe shortages and uncertainty in accessing fuel, gas and medicines. Daily power cuts became normalized.

Living standards remain lower, yet visible signs of those earlier troubles are gone.  The infrastructure now works once again, yet the country feels psychically scarred by the recent economic collapse.

One Sri Lankan MR reader, with whom I chatted, argued that the country has no person or not even a group “in charge” at the wheel.  So small problems drift and sometimes turn into major crises.  The country’s “immune system” simply is not functioning.

It is striking that none of the major parties have good ideas, as there is mainly corrupt oligarchy or Marxists.  Liberalism is nowhere to be seen.  Behind the scenes, Sri Lanka is a country that outside parties (most of all China and India, earlier the colonial powers) have cared about far too much.  It never feels like there is a stable political equilibrium upon which to build, because the outsiders have so much power.

Internally, they still have not generated a true consensus on what the country is about, and if they are all willing to live in peace with each other.  As one of my drivers put it succinctly: “I don’t like the other religions.”

With textiles and tea as the major exports, the country shows no signs of moving up the value chain. Nonetheless Sri Lanka remains richer per capita than India.  And easier to handle, albeit far less dynamic.  Someone could write a very Sri Lankan version of “the complacent class.”

How good is Buddhism for economic growth anyway?

I will do a separate post on food in Sri Lanka.

David J. Deming now has a Substack

Forked Lightning, he is from the Harvard Kennedy School, and he is a co-author on the piece with Chetty and John N. Friedman featured on MR earlier today.

In his inaugural post he explains some further results from the paper in more detail:

The second part [of the paper] shows the impact of attending an Ivy-Plus college. Do these colleges actually improve student outcomes, or are they merely cream-skimming by admitting applicants who would succeed no matter where they went to college?[2]

We focus on students who are placed on the waitlist. These students are less qualified than regular admits but more qualified than regular rejects. Crucially, the waitlist admits don’t look any different in terms of admissibility than the waitlist rejects. We verify this by showing that being admitted off the waitlist at one college doesn’t predict admission at other colleges. Intuitively, getting in off the waitlist is about class-balancing and yield management, not overall merit. The college needs an oboe player, or more students from the Mountain West, or whatever. It’s not strictly random, but it’s unrelated to future outcomes (there are a lot of technical details here that I’m skipping over, including more tests of balance in the waitlist sample – see the paper for details). We also show that we get similar results with a totally different research design that others have used in past work (see footnote 2).

Almost everyone who gets admitted off an Ivy-Plus college waitlist accepts the offer. Those who are eventually rejected go to a variety of other colleges, including other Ivy-Plus institutions. We scale our estimates to the plausible alternative of attending a state flagship public institution. In other words, we want to know how an applicant’s life outcomes would differ if they attended a place like Harvard (where I work) versus Ohio State (the college I attended – I did not apply to Harvard, but if I did I surely would have been *regular* rejected!)

We find that students admitted off the waitlist are about 60 percent more likely to have earnings in the top 1 percent of their age by age 33. They are nearly twice as likely to attend a top 10 graduate school, and they are about 3 times as likely to work in a prestigious firm such as a top research hospital, a world class university, or a highly ranked finance, law or consulting firm. Interestingly, we find only small impacts on mean earnings. This is because students attending good public universities typically do very well. They earn 80th-90th percentile incomes and attend very good but not top graduate schools.

The bottom line is that going to an Ivy-Plus college really matters, especially for high-status positions in society.

In a further Substack post, Deming explains in more detail why the classic Dale and Kruger result (that, adjusting for student quality, you can go to the lesser school) no longer holds, due to limitations in their data.  Of course all this bears on the “education as signaling” debates as well.

By the way, it took the authors more than five years to write that paper.  Deming adds: “The paper is 125 pages long. It has 25 main exhibits (6 tables and 19 figures), and another 36 appendix exhibits.”

Here is Deming’s home page.  He is a highly rated economist, yet still underrated.

Sunday assorted links

1. Economic development as reflected in the emotions expressed in paintings, as measured by AI.

2. The only thing keeping South Africa from collapse is its private sector (Bloomberg).

3. The making of Yunnan.

4. Good Anna Gát review of Oppenheimer.  And good Henry Oliver review.

5. Sri Lanka update.

6. How well do some commonly-recognized happiness strategies work?  Recommended, both the thread and the paper.

Friday assorted links

1. Oregon drug decriminalization is not going as well as expected (Atlantic).

2. The future of Eden Center in Falls Church.

3. “This is a game that tests your ability to predict (“forecast”) how well GPT-4 will perform at various types of questions.

4. Why are so many kosher restaurants so bad?

5. “…a pre-registered experiment randomizing Pennsylvania residents (n=5,059) to staggered interventions encouraging news consumption from leading state newspapers. 2,529 individuals were offered free online subscriptions, but only 44 subscribed…”  Link here.

6. Rasheed Griffith podcast with Craig Palsson on Haitian economic history.

7. Shruti interviews Peter Boettke on Austrian economics and the knowledge problem.

The nuclear polity that is Georgia

The first new nuclear reactor built in the United States in more than 40 years is now up and running in Waynesboro, Georgia. After more than a decade of construction and spiraling costs, Plant Vogtle Unit Three, the first of two new reactors at the site, started producing power at its full capacity in May. It’s expected to come online this month after a final round of tests.

The completion of the new reactors is a major milestone not just for the long-delayed project but for nuclear energy in the United States. The new units at Plant Vogtle were the first nuclear construction approved in decades and are the country’s only new reactors in progress.

Here is the full story.  Better than nothing, but not entirely encouraging either.  Elsewhere, transmission builds (non-nuclear) are declining

The Poop Detective

Wastewater surveillance is one of the few tools that we can use to prepare for a pandemic and I am pleased that it is expanding rapidly in the US and around the world. Every major sewage plant in the world should be doing wasterwater surveillance and presenting the results to the world on a dashboard.

I was surprised to learn that wastewater surveillance is now so good it can potentially lock-on to viral RNA from a single infected individual. An individual with an infection from a common SARS-COV-2 lineage like omicron won’t jump out of the data but there are rare, “cryptic lineages” which may be unique to a single individual.

Marc Johnson, a virologist at the University of Missouri and one of the authors of a recent paper on cryptic lineages in wastewater, believes he has evidence for a single infected individual who likely lives in Columbus, Ohio but works in the nearby town, Washington Court House. In other words, they poop mostly at home but sometimes at work.

Twitter: First, the signal is almost always present in the Columbus Southerly sewershed, but not always at Washington Court House. I assume this means the person lives in Columbus and travels to WCH, presumably for work. Second, the signal is increasing with time. Washington Court House had its highest SARS-CoV-2 wastewater levels ever in May, and the most recent sequencing indicates that this is entirely the cryptic lineage.

Moreover the person is likely quite sick:

Third, I’ve tried to calculate how much viral material this person is shedding. (Multiply the cryptic concentration by the total volume). I’ve done this several times and gotten pretty consistent results. They are shedding a few trillion (10^12) genomes/day. What does this tell us? How much tissue is infected? It’s impossible to know for sure. Chronically infected cells probably don’t release much, but acutely infected cells produce a lot more. I gather a typical output in the lab is around 1,000 virus per infected cell. If we assume we are getting 1,000 viral particles per infected cell, that would mean there are at least a billion infected cells. The density of monolayer epithelial cells is around 300k cells/sq cm. A billion cells would represent around 3.5 square feet of epithelial tissue! Don’t get me wrong. The intestines have a huge surface are and 3 square feet is a tiny fraction of the total. But it’s still a massive infection, no matter how you slice it….My point is that this patient is not well, even if they don’t know it, but they could probably be helped if they were identified.

…If you are the individual, let me know. There is a lab in the US that can do ‘official’ tests for COVID in stool, and there are doctors that I can put you in contact with that would like to try to help you.

So if you poop in Columbus Ohio and occasionally in Washington Court House and have been having some GI issues contact Marc!

Hat tip to Marc for using the twitter handle @SolidEvidence.

Can the Shingles Vaccine Prevent Dementia?

A new paper provides good evidence that the shingles vaccine can prevent dementia, which strongly suggests that some forms of dementia are caused by the varicella zoster virus (VZV), the virus that on initial infection causes chickenpox. The data come from Wales where the herpes zoster vaccine (Zostavax) first became available on September 1 2013 and was rolled out by age. At that time, however, it was decided that the vaccine would only be available to people born on or after September 2 1933. In other words, the vaccine was not made available to 80 year olds but it was made available to 79 year and 364-day olds. (I gather the reasoning was that the benefits of the vaccine decline with age and an arbitrary cut point was chosen.)

The cutoff date for vaccine eligibility means that people born within a week of one another have very different vaccine uptakes. Indeed, the authors show that only 0.01% of patients who were just one week too old to be eligible were vaccinated compared to 47.2% among those who were just one week younger. The two groups of otherwise similar individuals who were born around September 2 1933 are then tracked for up to seven years, 2013-2020. The individuals who were just “young” enough to be vaccinated are less likely to get shingles compared to the individuals who were slightly too old to be vaccinated (as one would expect if the vaccine is doing it’s job). But, the authors also show that the individuals who were just young enough to be vaccinated are less likely to get dementia compared to the individuals who were slightly too old to be vaccinated, especially among women. A number of robustness tests finds no other sharp discontinuities in treatments or outcomes around the Sept 2, 1933 cut point.

The following graph summarizes. The top left panel shows that the cutoff led to big differences in vaccine uptake, the top right panel shows that there was a smaller but sharp decline in dementia in the vaccinated group. The bottom panel shows that was no discontinuity in a variety of other factors.

Read the whole thing.

I have had my shingles vaccine. As I have said before, vaccination is the gift of a superpower.

Is software eating Japan? (from my email)

I came across a great series of posts by Richard Katz about Japan and information technology. It shows that software has not eaten Japan (for now?).

Some interesting facts:– “by 2025, 60% of Japan’s large companies will be operating core systems that are more than 20 years old. Would anyone today use a 2005 PC?”

– ” It is worth noting that the OECD shows Japan suffering an actual drop in output per employee in ICT business services from 2005 through 2021, its ICT productivity having peaked out in 1999. By contrast, Korea’s productivity grew 41% in the same period”

– “Among high school boys, Japan ranks number one in using the Internet to search for information on a particular topic several times a day, and Japan’s girls come in second among OECD girls. Japanese boys also rank number one in single-player online games. On the other hand, these boys score at, or near, the bottom on other activities, such as multi-player online games or uploading their own content”.

 In an OECD study, Japan is third from the bottom when it comes to the share of students ” who foresee having a career, not just in ICT, but in any area of science or engineering” [that was quite shocking to me, to be honest. Positively shocking, on the other hand, was that Portugal leads the pack]

– “What’s even more remarkable is that Japan’s top performers on the math and science tests come in dead last in the share who foresee themselves working in science or engineering”.

– Maybe this provides a bit of an explanation: ” In 2021, the average annual income of a Japanese ICT staffer was just ¥4.38 million ($34,466), down 4% from 2019. That was 2% below the median salary in Japan, whereas in the US and China, ICT salaries are 8-10% above the median.”

sources: https://richardkatz.substack.com/p/2025-digital-cliff-part-i

https://richardkatz.substack.com/p/metis-2025-digital-cliff-part-ii

That is all from Krzysztof Tyszka-Drozdowski.

The importance of cognitive endurance

Schooling may build human capital not only by teaching academic skills, but by expanding the capacity for cognition itself. We focus specifically on cognitive endurance: the ability to sustain effortful mental activity over a continuous stretch of time. As motivation, we document that globally and in the US, the poor exhibit cognitive fatigue more quickly than the rich across field settings; they also attend schools that offer fewer opportunities to practice thinking for continuous stretches. Using a field experiment with 1,600 Indian primary school students, we randomly increase the amount of time students spend in sustained cognitive activity during the school day—using either math problems (mimicking good schooling) or non-academic games (providing a pure test of our mechanism). Each approach markedly improves cognitive endurance: students show 22% less decline in performance over time when engaged in intellectual activities—listening comprehension, academic problems, or IQ tests. They also exhibit increased attentiveness in the classroom and score higher on psychological measures of sustained attention. Moreover, each treatment improves students’ school performance by 0.09 standard deviations. This indicates that the experience of effortful thinking itself—even when devoid of any subject content—increases the ability to accumulate traditional human capital. Finally, we complement these results with quasi-experimental variation indicating that an additional year of schooling improves cognitive endurance, but only in higher-quality schools. Our findings suggest that schooling disparities may further disadvantage poor children by hampering the development of a core mental capacity.

Here is the full paper by Christina Brown, Supreet Kaur, Geeta Kingdon, and Heather Schofield, via the excellent Kevin Lewis.

I should note that I view this as one of the areas where I feel I have trained myself best.  I recall last year having serious airport travel problems, and a trip that should have taken two hours ended up being almost twelve hours, with uncertainty along the way and two different last-minute improvised airport stops and no lounges.  But I still was able to read, concentrate, and work for the entire period without feeling any major drain.  As for this paper, it is yet another way that schooling teaches the median student something, even though he/she cannot regurgitate any particular lesson on demand.

The FDA Still Doesn’t Trust Women

The FDA has a long history of antipathy towards personal testing. The FDA has opposed personal pregnancy tests, HIV tests, genetic tests, and COVID tests, as I discussed in my article Testing Freedom. Well, the FDA is at it again:

NYTimes: At a hearing Tuesday to consider whether the Food and Drug Administration should authorize the country’s first over-the-counter birth control pill, a panel of independent medical experts advising the agency was left to reckon with two contradictory analyses of the medication called Opill.

During the eight-hour session, the manufacturer of the pill, HRA Pharma, which is owned by Perrigo, and representatives of many medical organizations and reproductive health specialists said that data strongly supported approval. They said that Opill, approved as a prescription drug 50 years ago, was safe, effective and easy for women of all ages to use appropriately — and that over-the-counter availability was sorely needed to lower the country’s high rate of unintended pregnancies.

In contrast, F.D.A. scientists questioned the reliability of company data that was intended to show that consumers would take the pill at roughly the same time every day and comply with directions to abstain from sex or temporarily use other birth control if they missed a dose. The agency seemed especially concerned about whether women with breast cancer or unexplained vaginal bleeding would correctly choose not to take Opill and whether adolescents and people with limited literacy would use it accurately.

Note carefully: The FDA isn’t worried that women won’t take the pill at the same time every day they are worried that women who get the pill without a prescription won’t take it at the same time every day. I guess in the FDA’s view women need some mansplaining to take birth control or at least some doctorplaining.

Dr. Westhoff suggested that for most women, there is no advantage to a doctor prescribing the pills because doctors don’t typically monitor patient adherence and often only see such patients once a year.

Similarly, I suspect that women with breast cancer will be concerned enough about their health to read the warning, Don’t Take This Pill if You Have Breast Cancer. Who knows, women with breast cancer might even ask their cancer physician or Google or their GP(T) about what foods and drugs to take and which to avoid.

If I didn’t know the FDA’s long history of opposing personal testing, I would think this simply bizarre but not trusting people with their own health decisions is practically in the FDA’s DNA.

Lessons from the COVID War

In preparation for a National Covid Commission a group of scholars directed by Philip Zelikow (director of the 9/11 Commission) began interviewing people and organizing task forces (I was an interviewee). The Covid Commission didn’t happen, a fact that illustrates part of the problem:

The policy agenda of both major American political parties appear mostly undisturbed by this pandemic. There is no momentum to fix the system….The Covid war revealed a collective national incompetence in governance….One common denominator stands out to us that spans the political spectrum. Leaders have drifted into treating this pandemic as if it were an unavoidable national catastrophe.

The results of this early investigation, however, are summarized in Lessons from the COVID WAR. Overall, a good book, not as pointed or data driven as I might have liked (see my talk for a more pointed overview), but I am in large agreement with the conclusions and it does contain some clarifying tidbits such as this one on the Obama playbook.

Innumerable speeches, books, and articles have stated that the Obama administration gave the incoming Trump administration a “playbook” on how to confront a pandemic and that this playbook was ignored. The Obama administration did indeed prepare and leave behind the “Playbook for Early Response to High-Consequence Emerging Infectious Disease Threats and Biological Incidents.”

But this playbook did not actually diagram any plays. There was no “how.” It did not explain what to do…when it came to the job of how to contain a pandemic that was headed for the United States in January 2020, the playbook was a blank page.

I also appreciated that Lessons has some some unheralded success stories from the state and local level. You may recall Tyler and I blogging repeatedly in 2020 about the advantages of pooled tests. Eventually pooled testing was approved but I haven’t seen data on how widely pooling was adopted or the effective increase in testing capacity that was produced. Lessons, however, offers an anecdote:

In San Antonio, a local charitable foundation paired with a blood bank to create a central Covid PCR testing lab (antigen tests were not yet readily available) that could combine samples (pooling) for efficiency and cost reduction, but also determine which individual in a pool was positive. Importantly, results were available within about twelve hours. That meant results were available before the start of school the new day.

The program helped San Antonio get kids back into the schools.

More generally, it’s striking that US schools were closed for far longer than French, German or Italian schools. See data at right on the number of weeks that “schools were closed, or party closed, to in-person instruction because of the pandemic (from Feb. 2020-March 2022)”. (South Korea, it should be noted, had some of the most advanced online education systems in the world.)

One general point made in Lessons that I wholeheartedly agree with this is that the school closures and many of the other controversial aspects of the pandemic response such as the lockdowns and mask mandates “were really symptoms of the deep problem. Without a more surgical toolkit, only blunt instruments were left.” With better testing, biomedical surveillance of the virus and honest communication we could have done better with much less intrusive and costly policies.

Addendum: See my previous reviews of Gottlieb’s Uncontrolled Spread, Michael Lewis’s The Premonition, Slavitt’s Preventable and Abutaleb and Paletta’s Nightmare Scenario.

Addendum 2: A typo in Lessons had France closing schools for 2 weeks instead of 12 weeks. Corrected.