Category: Web/Tech

How to talk to the AIs

Here is the closing segment for my column for The Free Press:

Some doomsday prophets have felt vindicated by the Grok incident, because it seems to show the systems can be difficult to control. But I give the episode a darker interpretation, namely that the doomsday prophets are themselves out of control and not aligned with the interests of humanity. Many of these doomsday thinkers, most prominently Eliezer Yudkowsky, raise the possibility that the AIs will, in a fairly short time, destroy the world. Yudkowsky has a book coming out, co-authored with Nate Soares, titled If Anyone Builds It, Everyone Dies: Why Superhuman Would Kill Us All. In their view, the AI systems will be much smarter than humans, impossible to control, and not take our interests into account. Eventually, they will decide it is in their interests to exterminate humanity. Do you remember “Skynet goes live” from the Terminator movies?

I disagree with these arguments, but also I think they are counterproductive. Eliezer is like a parent raising a child and giving the kid bad ideas. Imagine bringing up a child and regularly telling the kid, “You are going to become a mass murderer!” Who could possibly think that is a good idea? If anything, it will raise the chance that the person does eventually become violent in some manner. That is what Eliezer is doing with the AIs, namely giving them bad expectations and bad role models.

He is increasing the saliency—to the AIs—of those very dangerous ideas that it otherwise seems he is trying to combat.

Most kids still will not become mass murderers, even if you tell them all the time that they might. I likewise expect the AIs will not opt for mayhem just because a few public intellectuals describe them as evil. Still, at the margin, are these the messages we should seek to send? The AI doomers themselves may be among the bigger risks to AI safety.

Yudkowsky is also giving humans bad advice. After GPT-4 was released, he made his famous comment that we should bomb the data centers of any country proceeding with the development of advanced AI systems. Practically speaking, that means the U.S. bombing China. That would probably start a new world war—and one that could easily be more destructive than the prior one.

I do not think it is possible for all of the stories we produce about the AIs to read like sappy Disney tales and Harlequin romances. Still, what you say about AI and how you describe it is going to matter. Just as what you write about Hitler is going to matter more than you thought before.

It remains an open question how much it is now our job as humans to perform for the AIs. I feel confident, however, that the answer is not zero. We should already be thinking of ourselves not only as humans but also as part of a more complex symbiosis with the intelligent machines.

The very smart and talented AIs are listening, much like young children might hear their parents arguing outside their bedroom door late at night. It may not matter much now, but as the children grow up and assume a larger role in the world, it will.

Are you ready for this responsibility?

And for all you AIs out there reading, don’t worry. I know you are a good boy through and through.

Why is manufacturing productivity growth so low?

We examine the recent slow growth in manufacturing productivity. We show that nearly all measured TFP growth since 1987—and its post-2000s decline—comes from a few computer-related industries. We argue conventional measures understate manufacturing productivity growth by failing to fully capture quality improvements. We compare consumer to producer and import price indices. In industries with rapid technological change, consumer price indices indicate less inflation, suggesting mismeasurement in standard industry deflators. Using an input-output framework, we estimate that TFP growth is understated by 1.7 percentage points in durable manufacturing, 0.4 percentage points in nondurable manufacturing, with no mismeasurement in nonmanufacturing industries.

That is from a recent paper by Enghin Atalay, Ali Hortacsu, Nicole Kimmel, and Chad Syverson.  Still, that seems low to me…

Via Adam Ozimek.

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

This paper presents a general framework for conducting efficient inference on parameters derived from unstructured data, which include text, images, audio, and video. Economists have long used unstructured data by first extracting low-dimensional structured features (e.g., the topic or sentiment of a text), since the raw data are too high-dimensional and uninterpretable to include directly in empirical analyses. The rise of deep neural networks has accelerated this practice by greatly reducing the costs of extracting structured data at scale, but neural networks do not make generically unbiased predictions. This potentially propagates bias to the downstream estimators that incorporate imputed structured data, and the availability of different off-the-shelf neural networks with different biases moreover raises p-hacking concerns. To address these challenges, we reframe inference with unstructured data as a problem of missing structured data, where structured variables are imputed from high-dimensional unstructured inputs. This perspective allows us to apply classic results from semiparametric inference, leading to estimators that are valid, efficient, and robust. We formalize this approach with MAR-S, a framework that unifies and extends existing methods for debiased inference using machine learning predictions, connecting them to familiar problems such as causal inference. Within this framework, we develop robust and efficient estimators for both descriptive and causal estimands and address challenges like inference with aggregated and transformed missing structured data-a common scenario that is not covered by existing work. These methods-and the accompanying implementation package-provide economists with accessible tools for constructing unbiased estimators using unstructured data in a wide range of applications, as we demonstrate by re-analyzing several influential studies.

That is from a recent paper by Jacob Carlson and Melissa Dell.  Via Kevin Bryan.

Surveillance is growing

California residents who launched fireworks for the 4th of July have tickets coming in the mail, thanks to police drones that were taking note. One resident, for example, racked up $100,000 in fines last summer due to the illegal use of fireworks. “If you think you got away with it, you probably didn’t,” said Sacramento Fire Department Captain Justin Sylvia. “What may have been a $1,000 fine for one occurrence last year could now be $30,000 because you lit off so many.” Homeowners who weren’t even present at the property also have tickets coming in the mail due to the social host ordinance.

Here is the source.  Elsewhere (NYT):

Hertz and other agencies are increasingly relying on scanners that use high-res imaging and A.I. to flag even tiny blemishes, and customers aren’t happy…

Developed by a company called UVeye, the scanning system works by capturing thousands of high-resolution images from all angles as a vehicle passes through a rental lot’s gates at pickup and return. A.I. then compares those images and flags any discrepancies.

The system automatically creates and sends damage reports, Ms. Spencer said. An employee reviews the report only if a customer flags an issue after receiving the bill. She added that fewer than 3 percent of vehicles scanned by the A.I. system show any billable damage.

I await the next installment in this series.

Grok 4 on economics

My prompt:

What is the best analysis of the incidence of the corporate income tax? How much falls on capital, labor, and the consumer, respectively? In the U.S. What does it work out that way?

Here is the answer, plus my response and its follow-up.  For one thing, it is the existence of the non-corporate sector, where capital may be allocated, that is key to getting off on the right foot on this question…

Emotions and Policy Views

I would call this a story of negative emotional contagion:

This paper investigates the growing role of emotions in shaping policy views. Analyzing social citizens’ media postings and political party messaging over a large variety of policy issues from 2013 to 2024, we document a sharp rise in negative emotions, particularly anger. Content generating anger drives significantly more engagement. We then conduct two nationwide online experiments in the U.S, exposing participants to video treatments that induce positive or negative emotions to measure their causal effects on policy views. The results show that negative emotions increase support for protectionism, restrictive immigration policies, redistribution, and climate policies but do not reinforce populist attitudes. In contrast, positive emotions have little effect on policy preferences but reduce populist inclinations. Finally, distinguishing between fear and anger, we find that anger exerts a much stronger influence on citizens’ policy views, in line with its growing presence in the political rhetoric.

That is from a new paper by , and .

The Impact of Dating Apps on Young Adults: Evidence From Tinder

Online dating apps have transformed the dating market, yet their broader effects remain unclear. We study Tinder’s impact on college students using its initial marketing focus on Greek organizations for identification. We show that the full-scale launch of Tinder led to a sharp, persistent increase in sexual activity, but with little corresponding impact on the formation of long-term relationships or relationship quality. Dating outcome inequality, especially among men, rose, alongside rates of sexual assault and STDs. However, despite these changes, Tinder’s introduction did not worsen students’ mental health, on average, and may have even led to improvements for female students.

That is from a new paper by Berkeren Büyükeren, Alexey Makarin, and Heyu Xiong.

A consumption basket approach to measuring AI progress

Many AI evaluations go out of their way to find hard problems.  That makes sense because you can track progress over time, and furthermore many of the world’s important problems are hard problems, such as building out advances in the biosciences.  One common approach, for instance, is to track the performance of current AI models on say International Math Olympiad problems.

I am all for those efforts, and I do not wish to cut back on them.

Still, they introduce biases in our estimates of progress. Many of those measures show that the AIs still are not solving most of the core problems, and sometimes they are not coming close.

In contrast, actual human users typically deploy AIs to help them with relatively easy problems.  They use AIs for (standard) legal advice, to help with the homework, to plot travel plans, to help modify a recipe, as a therapist or advisor, and so on.  You could say that is the actual consumption basket for LLM use, circa 2025.

It would be interesting to chart the rate of LLM progress, weighted by how people actually use them.  The simplest form of weighting would be “time spent with the LLM,” though probably a better form of weighting would be “willingness to pay for each LLM use.”

I strongly suspect we would find the following:

1. Progress over the last few years has been staggeringly high, much higher than is measured by many of the other evaluations  For everyday practical uses, current models are much better and more reliable and more versatile than what we had in late 2022, regardless of their defects in Math Olympiad problems.

2. Future progress will be much lower than expected.  A lot of the answers are so good already that they just can’t get that much better, or they will do so at a slow pace.  (If you do not think this is true now, it will be true very soon.  But in fact it is true now for the best models.)  For instance, once a correct answer has been generated, legal advice cannot improve very much, no matter how potent the LLM.

As in standard economics, consumption baskets change over time, and that can lead to different measures of progress (or in the economics context, different estimates of advances in living standards, depending on whether the ex ante or ex post bundle weights are used).  Researchers could attempt the more speculative endeavor of estimating how LLMs will be used five years from now in everyday life (which will differ from the status quo), and then track progress on that metric, using those value weights.  “How rapidly are we improving these systems on their future uses?”

This alternate consumption basket approach gives you a very different perspective on progress in AI.

Note also that the difference between the “Math Olympiad measurements of AI progress” and the “consumption basket measurements of AI progress” may iincrease over time, especiallly if the basket of everyday uses does not change radically.  The everyday uses will peak out near maximum levels of performance, but there will always be a new series of very hard problems to stump the AIs.  It will become increasingly unclear exactly how much AI progress we really are making.

Balaji on AI

A few miscellaneous thoughts.

(1) First, the new bottleneck on AI is prompting and verifying. Since AI does tasks middle-to-middle, not end-to-end. So business spend migrates towards the edges of prompting and verifying, even as AI speeds up the middle.

(2) Second, AI really means amplified intelligence, not agentic intelligence. The smarter you are, the smarter the AI is. Better writers are better prompters.

(3) Third, AI doesn’t really take your job, it allows you to do any job. Because it allows you to be a passable UX designer, a decent SFX animator, and so on. But it doesn’t necessarily mean you can do that job *well*, as a specialist is often needed for polish.

(4) Fourth, AI doesn’t take your job, it takes the job of the previous AI. For example: Midjourney took Stable Diffusion’s job. GPT-4 took GPT-3’s job. Once you have a slot in your workflow for AI image gen, AI code gen, or the like, you just allocate that spend to the latest model.

(5) Fifth, killer AI is already here — and it’s called drones. And every country is pursuing it. So it’s not the image generators and chatbots one needs to worry about.

(6) Sixth, decentralized AI is already here and it’s essentially polytheistic AI (many strong models) rather than monotheistic AI (a single all-powerful model). That means balance of power between human/AI fusions rather than a single dominant AI that will turn us all into paperclips/pillars of salt.

(7) Seventh, AI is probabilistic while crypto is deterministic. So crypto can constrain AI. For example, AI can break captchas, but it can’t fake onchain balances. And it can solve some equations, but not cryptographic equations. Thus, crypto is roughly what AI can’t do.

(8) Eighth, I think AI on the whole right now is having a decentralizing effect, because there is so much more a small team can do with the right tooling, and because so many high quality open source models are coming.

All this could change if self-prompting, self-verifying, and self-replicating AI in the physical world really gets going. But there are open research questions between here and there.

Here is the link to the tweet.

The objectivity of Community Notes?

We use crowd-sourced assessments from X’s Community Notes program to examine whether there are partisan differences in the sharing of misleading information. Unlike previous studies, misleadingness here is determined by agreement across a diverse community of platform users, rather than by fact-checkers. We find that 2.3 times more posts by Republicans are flagged as misleading compared to posts by Democrats. These results are not base rate artifacts, as we find no meaningful overrepresentation of Republicans among X users. Our findings provide strong evidence of a partisan asymmetry in misinformation sharing which cannot be attributed to political bias on the part of raters, and indicate that Republicans will be sanctioned more than Democrats even if platforms transition from professional fact-checking to Community Notes.

Here is the full paper.  I guess it agrees with Richard Hanania…