How badly do humans misjudge AIs?

We study how humans form expectations about the performance of artificial intelligence (AI) and consequences for AI adoption. Our main hypothesis is that people project human-relevant problem features onto AI. People then over-infer from AI failures on human-easy tasks, and from AI successes on human-difficult tasks. Lab experiments provide strong evidence for projection of human difficulty onto AI, predictably distorting subjects’ expectations. Resulting adoption can be sub-optimal, as failing human-easy tasks need not imply poor overall performance in the case of AI. A field experiment with an AI giving parenting advice shows evidence for projection of human textual similarity. Users strongly infer from answers that are equally uninformative but less humanly-similar to expected answers, significantly reducing trust and engagement. Results suggest AI “anthropomorphism” can backfire by increasing projection and de-aligning human expectations and AI performance.

That is from a new paper by Raphael Raux, job market candidate from Harvard.  The piece is co-authored with Bnaya Dreyfuss.

AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably

That is the title of a new paper in Nature, here is part of the abstract:

We conducted two experiments with non-expert poetry readers and found that participants performed below chance levels in identifying AI-generated poems (46.6% accuracy, χ2(1, N = 16,340) = 75.13, p < 0.0001). Notably, participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems (χ2(2, N = 16,340) = 247.04, p < 0.0001). We found that AI-generated poems were rated more favorably in qualities such as rhythm and beauty, and that this contributed to their mistaken identification as human-authored. Our findings suggest that participants employed shared yet flawed heuristics to differentiate AI from human poetry: the simplicity of AI-generated poems may be easier for non-experts to understand, leading them to prefer AI-generated poetry and misinterpret the complexity of human poems as incoherence generated by AI.

By Brian Porter and Edouard Machery.  I do not think that pointing out the poor quality of human taste much dents the import of this result.

How to make DOGE work

That is the topic of my latest Bloomberg column, here is one excerpt:

Another priority should be to deregulate medical trials. America is now in a golden age of medical discovery, with mRNA vaccines, anti-malaria vaccines, GLP-1 weight loss drugs and new treatments against cancer all showing great promise. AI may bring about still more advances.

Unfortunately, the US system of clinical trials remains a major obstacle to turning all this science into medicine. There are regulations concerning hospital protocols, the design of the trials, FDA requirements, the procedures of universities and institutional review boards, and the handling of data, among other barriers. America can have better and speedier approval procedures without lowering its standards.

Of all the tasks I’ve outlined, this is by far the most difficult, because it involves changes in so many different kinds of institutions. Yet it has one of the highest possible payoffs, because more treatments might be developed and made available if the clinical trial process weren’t so onerous. Reforming clinical trials should also appeal to older Americans, who are especially likely to vote and who think the most about their medical care. The goal should be an America where most people live to 90.

Many Republicans are very excited about DOGE. But its governance structure is undefined and untested. It does not have a natural home or an enduring constituency. It cannot engage in much favor-trading. Its ability to keep Trump’s attention and loyalty may prove limited. And it’s not clear that deregulation is a priority for many voters.

The more I read about DOGE from Vivek and Musk, the more I feel it needs a greater sense of prioritization.

Saturday assorted links

1. Criminals are targeting luxury cheeses.

2. twodw: “Harris lost all seven swing states, but in five of them – Georgia, Michigan, Nevada, North Carolina, Wisconsin – she’s ending up with more raw votes than Biden got in 2020. If you still think all those swing state mail votes in 2020 were fake, idk what to tell you.”

3. Evo, for biology.

4. Gremlin, for UFO hunting.

5. Influence of Bell Labs.

The new Roger Penrose biography

The author is Patchen Barss, and the title is The Impossible Man: Roger Penrose and the Cost of Genius.  I liked this book very much, and feel there should be more works like this.  It was made with the full cooperation of Penrose himself, though he had no veto over the final work.  Here is one bit:

Many relativists had a powerful feel for formal math: errors in calculations leapt out at them the way off-key notes rankle a musician’s ear.  Not Roger.  Equations required too much mental labour and restricted his creativity.  His “magic” came from the shape of things.  He preferred to run his fingers along the curves and twists of space and time and find in those graceful lines the story of how every particle, force, and phenomenon acquired its properties.

The book covers Penrose’s personal life as well:

Judith encouraged him to sort out his relationship with Joan independently of his feelings for her.  He wasn’t sure that made sense.  In a deterministic universe, could he really take ownership of his unhappy marriage?  The idea felt strange to him.

Returning to physics:

Roger’s curiosity about consciousness came from many places — the extreme mental feats of his father and brothers, the speed of decision making in racquet sports, the human ability to transcend Kurt Gödel’s incompleteness theorem, and his own capacity to discover new mathematical insights.

Then again, one might return to matters of his personal life:

…he [Penrose] offhandedly observed how many scientists — not just him — seem to have “troubled marriages.”  He implied that a solitary life might be an inevitable consequence, a necessary price, for his kind of success.  True, Wolfgang and Ted [friends] had long, happy marriages.  Then again, neither of them had won a Nobel Prize.

His tone was one of justification rather than regret.  He didn’t see how it could be any other way.

Recommended.  Here is a good NYT review.

Friday assorted links

1. Michael Magoon has “progress” recommendations for the second Trump administration.

2. Update on LLMs and chess.  And a further update on that.

3. Podcast on speaking to whales.

4. Bitcoin for the sovereign, by Matt Huang.

5. Yes Woke has peaked.

6. Berlin techno clubs are closing in large numbers (Times of London).

7. Stripe: “Adding payments to your LLM agentic workflows

Will Trump Appoint a Great FDA Commissioner?

A German newspaper asked for my take on the nomination of RFK Jr. to head HHS. Here’s what I said:

Operation Warp Speed stands as the crowning achievement of the first Trump administration, exemplifying the impact of a bold public-private partnership. OWS accelerated vaccine development, production, and distribution beyond what most experts thought possible, saving hundreds of thousands of American lives and demonstrating the power of American ingenuity in a time of crisis.

By nominating Robert F. Kennedy Jr., a prominent anti-vaccine activist, President Trump undermines his own legacy, and casts doubt on his administration’s commitment to protecting American lives through science-driven health policy.

Many better choices are available. Here is my 2017 post on potential people to head the FDA, many of which would also be great at HHS. No indent. Key points remain true.

As someone who has written about FDA reform for many years it’s gratifying that all of the people whose names have been floated for FDA Commissioner would be excellent, including Balaji SrinivasanJim O’NeillJoseph Gulfo, and Scott Gottlieb. Each of these candidates understands two important facts about the FDA. First, that there is fundamental tradeoff–longer and larger clinical trials mean that the drugs that are approved are safer but at the price of increased drug lag and drug loss. Unsafe drugs create concrete deaths and palpable fear but drug lag and drug loss fill invisible graveyards. We need an FDA commissioner who sees the invisible graveyard.

Each of the leading candidates also understands that we are entering a new world of personalized medicine that will require changes in how the FDA approves medical devices and drugs. Today almost everyone carries in their pocket the processing power of a 1990s supercomputer. Smartphones equipped with sensors can monitor blood pressure, perform ECGs and even analyze DNA. Other devices being developed or available include contact lens that can track glucose levels and eye pressure, devices for monitoring and analyzing gait in real time and head bands that monitor and even adjust your brain waves.

The FDA has an inconsistent even schizophrenic attitude towards these new devices—some have been approved and yet at the same time the FDA has banned 23andMe and other direct-to-consumer genetic testing companies from offering some DNA tests because of “the risk that a test result may be used by a patient to self-manage”. To be sure, the FDA and other agencies have a role in ensuring that a device or test does what it says it does (the Theranos debacle shows the utility of that oversight). But the FDA should not be limiting the information that patients may discover about their own bodies or the advice that may be given based on that information. Interference of this kind violates the first amendment and the long-standing doctrine that the FDA does not control the practice of medicine.

Srinivisan is a computer scientist and electrical engineer who has also published in the New England Journal of Medicine, Nature Biotechnology, and Nature Reviews Genetics. He’s a co-founder of Counsyl, a genetic testing firm that now tests ~4% of all US births, so he understands the importance of the new world of personalized medicine.

The world of personalized medicine also impacts how new drugs and devices should be evaluated. The more we look at people and diseases the more we learn that both are radically heterogeneous. In the past, patients have been classified and drugs prescribed according to a handful of phenomenological characteristics such as age and gender and occasionally race or ethnic background. Today, however, genetic testing and on-the-fly examination of RNA transcripts, proteins, antibodies and metabolites can provide a more precise guide to the effect of pharmaceuticals in a particular person at a particular time.

Greater targeting is beneficial but as Peter Huber has emphasized it means that drug development becomes much less a question of does this drug work for the average patient and much more about, can we identify in this large group of people the subset who will benefit from the drug? If we stick to standard methods that means even larger and more expensive clinical trials and more drug lag and drug delay. Instead, personalized medicine suggests that we allow for more liberal approval decisions and improve our techniques for monitoring individual patients so that physicians can adjust prescribing in response to the body’s reaction. Give physicians a larger armory and let them decide which weapon is best for the task.

I also agree with Joseph Gulfo (writing with Briggeman and Roberts) that in an effort to be scientific the FDA has sometimes fallen victim to the fatal conceit. In particular, the ultimate goal of medical knowledge is increased life expectancy (and reducing morbidity) but that doesn’t mean that every drug should be evaluated on this basis. If a drug or device is safe and it shows activity against the disease as measured by symptoms, surrogate endpoints, biomarkers and so forth then it ought to be approved. It often happens, for example, that no single drug is a silver bullet but that combination therapies work well. But you don’t really discover combination therapies in FDA approved clinical trials–this requires the discovery process of medical practice. This is why Vincent DeVita, former director of the National Cancer Institute, writes in his excellent book, The Death of Cancer:

When you combine multidrug resistance and the Norton-Simon effect , the deck is stacked against any new drug. If the crude end point we look for is survival, it is not surprising that many new drugs seem ineffective. We need new ways to test new drugs in cancer patients, ways that allow testing at earlier stages of disease….

DeVita is correct. One of the reasons we see lots of trials for end-stage cancer, for example, is that you don’t have to wait long to count the dead. But no drug has ever been approved to prevent lung cancer (and only six have ever been approved to prevent any cancer) because the costs of running a clinical trial for long enough to count the dead are just too high to justify the expense. Preventing cancer would be better than trying to deal with it when it’s ravaging a body but we won’t get prevention trials without changing our standards of evaluation.

Jim O’Neill, managing director at Mithril Capital Management and a former HHS official, is an interesting candidate precisely because he also has an interest in regenerative medicine. With a greater understanding of how the body works we should be able to improve health and avoid disease rather than just treating disease but this will require new ways of thinking about drugs and evaluating them. A new and non-traditional head of the FDA could be just the thing to bring about the necessary change in mindset.

In addition, to these big ticket items there’s also a lot of simple changes that could be made at the FDA. Scott Alexander at Slate Star Codex has a superb post discussing reciprocity with Europe and Canada so we can get (at the very least) decent sunscreen and medicine for traveler’s diarrhea. Also, allowing any major pharmaceutical firm to produce any generic drug without going through a expensive approval process would be a relatively simply change that would shut down people like Martin Shkreli who exploit the regulatory morass for private gain.

The head of the FDA has tremendous power, literally the power of life and death. It’s exciting that we may get a new head of the FDA who understands both the peril and the promise of the position.

*Kaput: The End of the German Miracle*

By Wolfgang Münchau, this book is the best and most detailed account of the German economic decline to date.  Excerpt:

In 2018, the federal government promised that Germany would become a world leader in artificial intelligence.

As if they don’t understand that such efforts are more than just a play toy.  The overall lesson I took away from this book (my interpretation, not the author’s) is that if a country does not have enough ambition and seriousness in its businesses and education systems, sooner or later it will not have that in its government either.

Another lesson is that the world, overall, is working less well than you might think.

I am sad to recommend this one.  My primary reservation is that the author does not do enough to diagnose Germany’s obvious cultural malaise as an underlying root cause.

Let’s reform taxation for expats

That is the topic of my latest Bloomberg column.  We should tax on the basis of residency, not citizenship.  Only Eritrea shares with us the practice of doing the latter.  Here is one excerpt:

There is another possible gain, one which may have more appeal to the incoming administration than to economists like me. Trump and his advisers have long worried about the US trade deficit, and there has been talk of taxing foreign direct investment to weaken the dollar and boost US exports. Rather than discourage foreign investment in the US, why not do something more positive — and encourage US investment abroad?

If Americans leave and start new businesses around the world, using previously domestic capital, that too will bring downward pressure on the dollar. Thus could Trumpian ends be achieved by more constructive means. As a bonus, such a plan would not alienate foreign countries, as they tend to view a tax on FDI as a tax on their citizens.

This tax reform also might benefit US exports more directly. Say a US company wants to send an employee abroad to do market research and explore distribution channels for future sales. The US tax system should not turn this into a difficult ordeal.

Encouraging more Americans to work abroad also is a form of foreign aid, as many of them will grow or start businesses, creating jobs and tax revenue for the foreign country. And it is a form of foreign aid that benefits US citizens rather than costing them money. In fact, given the prowess of US business, it may be one of America’s most effective forms of foreign aid.

This one we should just do.