Estimating the COVID-19 Infection Rate: Anatomy of an Inference Problem

That is a recent paper by Manski and Molinari, top people with econometrics. Here is the abstract:

As a consequence of missing data on tests for infection and imperfect accuracy of tests, reported rates of population infection by the SARS CoV-2 virus are lower than actual rates of infection. Hence, reported rates of severe illness conditional on infection are higher than actual rates. Understanding the time path of the COVID-19 pandemic has been hampered by the absence of bounds on infection rates that are credible and informative. This paper explains the logical problem of bounding these rates and reports illustrative findings, using data from Illinois, New York, and Italy. We combine the data with assumptions on the infection rate in the untested population and on the accuracy of the tests that appear credible in the current context. We find that the infection rate might be substantially higher than reported. We also find that the infection fatality rate in Italy is substantially lower than reported.

Here is a very good tweet storm on their methods, excerpt: “What I love about this paper is its humility in the face of uncertainty.” And: “…rather than trying to get exact answers using strong assumptions about who opts-in for testing, the characteristics of the tests themselves, etc, they start with what we can credibly know about each to build bounds on each of these quantities of interest.”

I genuinely cannot give a coherent account of “what is going on” with Covid-19 data issues and prevalence. But at this point I think it is safe to say that the mainstream story we have been living with for some number of weeks now just isn’t holding up.

For the pointer I thank David Joslin.