My Conversation with Philip Tetlock

Here is the audio and transcript, here is part of the summary:

He joined Tyler to discuss whether the world as a whole is becoming harder to predict, whether Goldman Sachs traders can beat forecasters, what inferences we can draw from analyzing the speech of politicians, the importance of interdisciplinary teams, the qualities he looks for in leaders, the reasons he’s skeptical machine learning will outcompete his research team, the year he thinks the ascent of the West became inevitable, how research on counterfactuals can be applied to modern debates, why people with second cultures tend to make better forecasters, how to become more fox-like, and more.

Here is one excerpt:

COWEN: If you could take just a bit of time away from your research and play in your own tournaments, are you as good as your own best superforecasters?

TETLOCK: I don’t think so. I don’t think I have the patience or the temperament for doing it. I did give it a try in the second year of the first set of forecasting tournaments back in 2012, and I monitored the aggregates. We had an aggregation algorithm that was performing very well at the time, and it was outperforming 99.8 percent of the forecasters from whom the composite was derived.

If I simply had predicted what the composite said at each point in time in that tournament, I would have been a super superforecaster. I would have been better than 99.8 percent of the superforecasters. So, even though I knew that it was unlikely that I could outperform the composite, I did research some questions where I thought the composite was excessively aggressive, and I tried to second guess it.

The net result of my efforts — instead of finishing in the top 0.02 percent or whatever, I think I finished in the middle of the superforecaster pack. That doesn’t mean I’m a superforecaster. It just means that when I tried to make a forecast better than the composite, I degraded the accuracy significantly.

COWEN: But what do you think is the kind of patience you’re lacking? Because if I look at your career, you’ve been working on these databases on this topic for what? Over 30 years. That’s incredible patience, right? More patience than most of your superforecasters have shown. Is there some dis-aggregated notion of patience where they have it and you don’t?

TETLOCK: [laughs] Yeah, they have a skill set. In the most recent tournaments, we’ve been working on with them, this becomes even more evident — their willingness to delve into the details of really pretty obscure problems for very minimal compensation is quite extraordinary. They are intrinsically cognitively motivated in a way that is quite remarkable. How am I different from that?

I guess I have a little bit of attention deficit disorder, and my attention tends to roam. I’ve not just worked on forecasting tournaments. I’ve been fairly persistent in pursuing this topic since the mid 1980s. Even before Gorbachev became general party secretary, I was doing a little bit of this. But I’ve been doing a lot of other things as well on the side. My attention tends to roam. I’m interested in taboo tradeoffs. I’m interested in accountability. There’re various things I’ve studied that don’t quite fall in this rubric.

COWEN: Doesn’t that make you more of a though? You know something about many different areas. I could ask you about antebellum American discourse before the Civil War, and you would know who had the smart arguments and who didn’t. Right?

And another:

TETLOCK:

…I had a very interesting correspondence with in the 1980s about forecasting tournaments. We could talk a little about it later. The upshot of this is that young people who are upwardly mobile see forecasting tournaments as an opportunity to rise. Old people like me and aging baby-boomer types who occupy relatively high status inside organizations see forecasting tournaments as a way to lose.

If I’m a senior analyst inside an intelligence agency, and say I’m on the National Intelligence Council, and I’m an expert on China and the go-to guy for the president on China, and some upstart R&D operation called IARPA says, “Hey, we’re going to run these forecasting tournaments in which we assess how well the analytic community can put probabilities on what Xi Jinping is going to do next.”

And I’ll be on a level playing field, competing against 25-year-olds, and I’m a 65-year-old, how am I likely to react to this proposal, to this new method of doing business? It doesn’t take a lot of empathy or bureaucratic imagination to suppose I’m going to try to nix this thing.

COWEN: Which nation’s government in the world do you think listens to you the most? You may not know, right?

Definitely recommended.

Wednesday assorted links

1. Is the wealth-freedom correlation weakening?

2. Mechanism design to reduce medical supply shortfalls during pandemics.

3. Vox covers Emergent Ventures/Fast Grants.

4. Good summary of the new Los Angeles prevalence results.  And here is a thread of caution.

5. Poems for pandemics.

6. James Hamilton on negative oil prices.

7. Salim Furth blames the automobile, not the NYC subway.  And here is criticism of the subway result from a blogger.  Reading both my judgment is that the subway result does not hold up.

8. Journal of Controversial Ideas is now open and accepting papers.

9. Using Kalman filtering to estimate R.

10. “Wash Your Hands,” Roaring Lion, Trinidad calypso.

Escape from New York

We look at demographic mobility responses to Covid in NYC using mobile phone GPS, finding – wealthy flee the city – different sheltering response among demographic groups in the city – helps account for disparities in health outcomes

That is from new research by Arpit Gupta, full paper here.  And:

Searches for moving to NYC suburbs are up almost 250% compared to the same period in 2019.

Story here.  Of course maybe those are the same people who in 2016 promised to move to Canada.

Immigration will be largely shut down for some time to come

That is the topic of my Bloomberg column, here is one bit:

Whether or not that reaction is rational, it is easy to imagine the public being fearful about the potential of immigration to contribute to a pandemic resurgence. It does seem that regions able to restrict in-migration relatively easily — such as New Zealand, Iceland and Hawaii — have had less severe Covid-19 problems. New York City, which takes in people from around the world, has had America’s most severe outbreak. And the recent appearance of a second wave of Covid-19 in Singapore has been connected to ongoing migration there.

I have never thought the federal government would build Trump’s wall on the U.S.-Mexico border. But now I wonder whether it may well happen — perhaps in electronic form.

And:

In addition to these effects, many migrants currently living in the U.S. might go back home. Say you are from southern India and live in Atlanta, and typically your parents or grandparents come to visit once a year. That is now much harder for them to do, and will be for the foreseeable future. India also might make it more difficult for Indian-Americans to return to visit their relatives, perhaps demanding an immunity certificate for entry. Many of these current migrants will end up returning home to live in their native countries.

But not all immigration will vanish:

n spite of all those possible restrictions, the pandemic itself may offer new reasons to embrace some forms of migration, if only to help Western economies continue to function. Many jobs are now more dangerous than before, because they involve face-to-face contact and time spent in enclosed spaces. Such professions as nursing and dental assistants, for example, already attracted many immigrants even before Covid-19. Working on farms may yet become more perilous if the virus strikes farm worker communities. New migrants from poorer countries will be willing to take on these risks — for extra income of course — but most U.S. citizens won’t go near them.

The reality may be an uptick in some forms of migration, mostly for relatively hazardous jobs.

In any case, the immigration debate two or three years from now will seem virtually unrecognizable, compared to what we had been expecting.

The Subways Seeded the Massive Coronavirus Epidemic in New York City

New York City’s multitentacled subway system was a major disseminator – if not the principal transmission vehicle – of coronavirus infection during the initial takeoff of the massive epidemic that became evident throughout the city during March 2020. The near shutoff of subway ridership in Manhattan – down by over 90 percent at the end of March – correlates strongly with the substantial increase in the doubling time of new cases in this borough. Maps of subway station turnstile entries, superimposed upon zip code-level maps of reported coronavirus incidence, are strongly consistent with subway-facilitated disease propagation. Local train lines appear to have a higher propensity to transmit infection than express lines. Reciprocal seeding of infection appears to be the best explanation for the emergence of a single hotspot in Midtown West in Manhattan. Bus hubs may have served as secondary transmission routes out to the periphery of the city.

That is from a new NBER working paper by Jeffrey E. Harris.

California estimate of the day

Using daily state-level coronavirus data and a synthetic control research design, we find that California’s statewide SIPO reduced COVID-19 cases by 152,443 to 230,113 and COVID-19 deaths by 1,940 to 4,951 during the first three weeks following its enactment. Conservative back of the envelope calculations suggest that there were approximately 2 to 4 job losses per coronavirus case averted and 108 to 275 jobs losses per life saved during this short-run post-treatment period.

That is from a new NBER working paper by Friedson, McNichols, Sabia, and Dave.  As you probably know from now, I am reluctant to take “how well have we done with death so far” estimates at face value, but there you go.  You now have your California estimate of the day.

Tuesday assorted links

1. Estimating and classifying the labor market hit.

2. Watching Tucker Carlson is safer than watching Hannity.

3. Better governance is correlated with slower policy responses to Covid-19.

4. Tabulated data on asymptomatic infection rates.

5. Singapore getting much worse again (NYT).  And are hospitalizations decelerating in Sweden?

6. How the Belgians count Covid-19 deaths.  I call that one big nursing home fail, and I don’t just mean for Belgium.

7. Research paper with predictions for Stockholm.

8. Claims about heterogeneous strains — please use with extreme caution, I do not consider this verified, though it could be very important if true.

9. Study of France — only about 6% infected, other numbers too.

10. Covid-related deregulation on its way?

11. The Amish health care system.

12. The U.S. as insurer to the rest of the world during crises.

The Japanese coronavirus story

You may recall that some time ago MR posted an anonymous account of how the coronavirus problem actually was much worse in Japan than was being admitted by the Japanese government and broader establishment.  It is now clear that this Cassandra was correct.

I can now reveal to you the full story of that posting behind the first link, including my role in it.  Here is the opening excerpt:

By March 22nd, I strongly suspected there was a widespread coronavirus epidemic in Japan. This was not widely believed at the time. I, working with others, conducted an independent research project. By March 25th we had sufficient certainty to act. We projected that the default course of the epidemic would lead to a public health crisis.

We attempted to disseminate the results to appropriate parties, out of a sense of civic duty. We initially did this privately attached to our identities and publicly but anonymously to maximize the likelihood of being effective and minimize risks to the response effort and to the team. We were successful in accelerating the work of others.

The situation is, as of this writing, still very serious. In retrospect, our pre-registered results were largely correct. I am coming forward with them because the methods we used, and the fact that they arrived at a result correct enough to act upon prior to formal confirmation, may accelerate future work and future responses here and elsewhere.

I am an American. I speak Japanese and live in Tokyo. I have spent my entire adult life in Japan. I have no medical nor epidemiology background. My professional background is as a software engineer and entrepreneur. I presently work in technology. This project was on my own initiative and in my personal capacity.

I am honored to have played a modest role in this story, though full credit goes elsewhere, do read the whole thing.  Hashing plays a key role in the longer narrative.

The Roadmap to Pandemic Resilience

Led by Danielle Allen and Glen Weyl, the Safra Center for Ethics at Harvard has put out a Roadmap to Pandemic Resilience (I am a co-author along with others). It’s the most detailed plan I have yet seen on how to ramp up testing and combine with contact tracing and supported isolation to beat the virus.

One of the most useful parts of the roadmap is that choke points have been identified and solutions proposed. Three testing choke points, for example, are that nasal swaps make people sneeze which means that health care workers collecting the sample need PPE. A saliva test, such as the one just approved, could solve this problem. In addition, as I argued earlier, we need to permit home test kits especially as self-swab from near nasal appears to be just as accurate as nasal swabs taken by a nurse. Second, once collected, the swab material is classified as a bio-hazard which requires serious transport and storage safety requirements. A inactivation buffer, however, could kill the virus without killing the RNA necessary for testing and thus reduce the need for bio-safety techniques in transportation which would make testing faster and cheaper. Finally, labs are working on reducing the reagents needed for the tests.

Understanding the choke points is a big step towards increasing the quantity of tests.

Economists survey epidemiological models

The authors are Christopher Avery, William Bossert, Adam Clark, Glenn Ellison, Sara Fisher Ellison, the paper is very good but the abstract is uninformative.  Here is one excerpt:

A notable shortcoming of the basic SIR model is that it does not allow for heterogeneity in state frequencies and rate constants. We discuss several different sources of heterogeneity in more detail in Section 2.

The most important and challenging heterogeneity in practice is that individual behavior varies over time. In particular, the spread of disease likely induces individuals to make private decisions to limit contacts with other people. Thus, estimates from scenarios that assume unchecked exponential spread of disease, such as the reported figures from the Imperial College model of 500,000 deaths in the UK and 2.2 million in the United States, do not correspond to the behavioral responses one expects in practice. Further, these gradual increases in “social-distancing” that can be expected over the courses of an epidemic change dynamics in a continuous fashion and thus blur the distinctions between mechanistic and phenomenological models.13 Each type of model can be reasonably well calibrated to an initial period of spread of disease, but further assumptions, often necessarily ad hoc in nature, are needed to extend either type of model to later phases of an epidemic.

I recommend the whole paper.

From my email

One thing that some people fail to realize is the following:  This disease will have about the same fraction of population infected plus recovered In the post-lockdown equilibrium regardless of the policy path that gets us there. However, that does not mean that the number of dead is the same for all policies, because the infection fatality rate is so heterogenous with this disease.

The (often unrecognized) elephant in the room is that one set of policies may sort the least vulnerable population to be infected first while another set of policies may sort the most vulnerable population to be infected last.  Protecting the most vulnerable effectively while infecting the least vulnerable quickly could theoretically save almost everyone for this particular disease.

Since the old and sick people often live in relative “lockdown” even at normal times, the general lockdown does the opposite of beneficial sorting by slowing down infections among the least vulnerable.  The general lockdown kills more people over the whole epidemic by tilting the sorting in an unfavorable direction.

The hospital crowding is in my opinion a relatively unimportant issue compared to this because there is no effective “silver bullet” therapy for the disease.

That one is anonymous!  And from another reader:

A lot people are citing a paper that looks at the impact of general lockdowns on ultimate deaths (over 24 month window) during the 1918 Spanish flue epidemic in the US. It’s important to understand that the 1918 epidemic and 2020 epidemic have a sorting effects that go in the opposite direction.

The 1918 disease was most dangerous to people with strong immune systems (young adults), and those people were also the ones that were most active in society and had most interpersonal contacts. Absent any general lockdown, those people were infected first and didn’t benefit from the long-run equilibrium of “herd immunity.” The general lockdowns during the 1918 disease epidemic reduced these vulnerable people’s infection probability relatively more than that of the less vulnerable people.  This improved sorting and thereby saved lives.

The 2020 disease works in the opposite way. It is the most dangerous to old, sick people with weakest immune systems.  Those people are relatively inactive at normal times and don’t have a large number of social contacts. The general lockdown increases those vulnerable people’s relative infection probability, because their routine doesn’t change much while less vulnerable people social distance. This adverse sorting due to general lockdowns causes more deaths, in theory at least.

In my opinion, the 1918 lockdown evidence should be interpreted as evidence of the importance of sorting, not evidence that general lockdowns are the right thing to do now.

Assorted non-Covid links

1. Bad trade and the loss of variety.

2. Can money buy happiness revisited: the new take is to hire a happiness agent.

3. Do people have a bias for low-deductible insurance? (yes, partly for peace of mind reasons)

4. Weird Phillips curve behavior has to do with costs, not degree of tightness in the labor market.

5. New results on Harvard discrimination against Asian-Americans.  “Asian Americans are substantially stronger than whites on the observables associated with admissions…the richness of the data yields a model that predicts admissions extremely well. Our preferred model shows that AsianAmericans would be admitted at a rate 19% higher absent this penalty.”

6. New Devon Zuegel podcast with Alain and Marie-Agnes Bertaud.

7. Janet Yellen teaches on YouTube.

What should we believe and not believe about R?

This is from my email, highly recommended, and I will not apply further indentation:

“Although there’s a lot of pre-peer-reviewed and strongly-incorrect work out there, I’ll single out Kevin Systrom’s rt.live as being deeply problematic. Estimating R from noisy real-world data when you don’t know the underlying model is fundamentally difficult, but a minimal baseline capability is to get sign(R-1) right (at least when |R-1| isn’t small), and rt.live is going to often be badly (and confidently) wrong about that because it fails to account for how the confirmed count data it’s based on is noisy enough to be mostly garbage. (Many serious modelers have given up on case counts and just model death counts.) For an obvious example, consider their graph for WA: it’s deeply implausible on its face that WA had R=.24 on 10 April and R=1.4 on 17 April. (In an epidemiological model with fixed waiting times, the implication would be that infectious people started interacting with non-infectious people five times as often over the course of a week with no policy changes.) Digging into the data and the math, you can see that a few days of falling case counts will make the system confident of a very low R, and a few days of rising counts will make it confident of a very high one, but we know from other sources that both can and do happen due to changes in test and test processing availability. (There are additional serious methodological problems with rt.live, but trying to nowcast R from observed case counts is already garbage-in so will be garbage-out.)

However, folks are (understandably, given the difficulty and the rush) missing a lot of harder stuff too. You linked a study and wrote “Good and extensive west coast Kaiser data set, and further evidence that R doesn’t fall nearly as much as you might wish for.” We read the study tonight, and the data set seems great and important, but we don’t buy the claims about R at all — we think there are major statistical issues. (I could go into it if you want, although it’s fairly subtle, and of course there’s some chance that *we’re* wrong…)

Ultimately, the models and statistics in the field aren’t designed to handle rapidly changing R, and everything is made much worse by the massive inconsistencies in the observed data. R itself is a surprisingly subtle concept (especially in changing systems): for instance, rt.live uses a simple relationship between R and the observed rate of growth, but their claimed relationship only holds for the simplest SIR model (not epidemiologically plausible at all for COVID-19), and it has as an input the median serial interval, which is also substantially uncertain for COVID-19 (they treat it as a known constant). These things make it easy to badly missestimate R. Usually these errors pull or push R away from 1 — rt.live would at least get sign(R – 1) right if their data weren’t garbage and they fixed other statistical problems — but of course getting sign(R – 1) right is a low bar, it’s just figuring out whether what you’re observing is growing or shrinking. Many folks would actually be better off not trying to forecast R and just looking carefully at whether they believe the thing they’re observing is growing or shrinking and how quickly.

All that said, the growing (not total, but mostly shared) consensus among both folks I’ve talked to inside Google and with academic epidemiologists who are thinking hard about this is:

  • Lockdowns, including Western-style lockdowns, very likely drive R substantially below 1 (say .7 or lower), even without perfect compliance. Best evidence is the daily death graphs from Italy, Spain, and probably France (their data’s a mess): those were some non-perfect lockdowns (compared to China), and you see a clear peak followed by a clear decline after basically one time constant (people who died at peak were getting infected right around the lockdown). If R was > 1 you’d see exponential growth up to herd immunity, if R was 0.9 you’d see a much bigger and later peak (there’s a lot of momentum in these systems). This is good news if true (and we think it’s probably true), since it means there’s at least some room to relax policy while keeping things under control. Another implication is the “first wave” is going to end over the next month-ish, as IHME and UTexas (my preferred public deaths forecaster; they don’t do R) predict.
  • Cases are of course massively undercounted, but the weight of evidence is that they’re *probably* not *so* massively undercounted that we’re anywhere near herd immunity (though this would of course be great news). Looking at Iceland, Diamond Princess, the other studies, the flaws in the Stanford study, we’re very likely still at < ~2-3% infected in the US. (25% in large parts of NYC wouldn’t be a shock though).

Anyways, I guess my single biggest point is that if you see a result that says something about R, there’s a very good chance it’s just mathematically broken or observationally broken and isn’t actually saying that thing at all.”

That is all from Rif A. Saurous, Research Director at Google, currently working on COVID-19 modeling.

Currently it seems to me that those are the smartest and best informed views “out there,” so at least for now they are my views too.

Monday assorted links

1. “Field-specific training is not relevant among the most talented PhDs because the performance gap between economics or finance PhDs and other PhDs disappears among published PhDs.

2. An extensive and pretty devastating article on the testing fail of the CDC.  Again, our regulatory state has been failing us.  And coverage from the NYT.

3. At the margin: “Results show that informants were given approximately 70 East German marks worth of rewards more per year in the areas that had access to WGTV, as compared with areas with no reception—ironically an amount roughly equivalent to the cost of an annual East German TV subscription.”

4. “Bars and Restaurants Peel Cash From Walls to Help Idled Workers” (NYT).

5. Scott Sumner watch the islands.  This piece seems to imply that in-migration is a major source of heterogeneity.  I’ve also been receiving some emails from Xavier suggested tourist inflow is a major cause of heterogeneity, due to an ever fresh supply of hard to trace cases.  No rigorous test yet of that one, but it is certainly in the running as a hypothesis.  And if true, it suggests many parts of Africa may not be hit that hard.

6. Karlson, Stern, and Klein on Sweden.

7. South Africa and HIV/AIDS: will the latter have been good training for Covid-19? (Economist)

8. The danger of “herd immunity overshoot.”

9. Singapore government and the Virus Vanguard.

10. Beloit University moves to more flexible two-course module system.  For now at least.

Lockdown socialism will collapse

Under Lockdown Socialism:

–you can stay in your residence, but paying rent or paying your mortgage is optional.

–you can obtain groceries and shop on line, but having a job is optional.

–other people work at farms, factories, and distribution services to make sure that you have food on the table, but you can sit at home waiting for a vaccine.

–people still work in nursing homes that have lost so many patients that they no longer have enough revenue to make payroll.

–professors and teachers are paid even though schools are shut down.

–police protect your property even though they are at risk for catching the virus and criminals are being set free.

–state and local governments will continue paying employees even though sales tax revenue has collapsed.

–if you own a small business, you don’t need revenue, because the government will keep sending checks.

–if you own shares in an airline, a bank, or other fragile corporations, don’t worry, the Treasury will work something out.

This might not be sustainable.

That is from Arnold Kling.  Too many of our elites are a little shy about pushing this message out there.