Estimating the COVID-19 Infection Rate: Anatomy of an Inference Problem

That is a recent paper by Manski and Molinari, top people with econometrics.  Here is the abstract:

As a consequence of missing data on tests for infection and imperfect accuracy of tests, reported rates of population infection by the SARS CoV-2 virus are lower than actual rates of infection. Hence, reported rates of severe illness conditional on infection are higher than actual rates. Understanding the time path of the COVID-19 pandemic has been hampered by the absence of bounds on infection rates that are credible and informative. This paper explains the logical problem of bounding these rates and reports illustrative findings, using data from Illinois, New York, and Italy. We combine the data with assumptions on the infection rate in the untested population and on the accuracy of the tests that appear credible in the current context. We find that the infection rate might be substantially higher than reported. We also find that the infection fatality rate in Italy is substantially lower than reported.

Here is a very good tweet storm on their methods, excerpt: “What I love about this paper is its humility in the face of uncertainty.”  And: “…rather than trying to get exact answers using strong assumptions about who opts-in for testing, the characteristics of the tests themselves, etc, they start with what we can credibly know about each to build bounds on each of these quantities of interest.”

I genuinely cannot give a coherent account of “what is going on” with Covid-19 data issues and prevalence.  But at this point I think it is safe to say that the mainstream story we have been living with for some number of weeks now just isn’t holding up.

For the pointer I thank David Joslin.


The base of infected might actually be higher based on a recent study finding co-infection with other respiratory illnesses.

So, if you presented to a doctor with flu systems, they would check first for standard flu, and not for covid, because covid testing was scarce, and the belief that covid and standard flu did not co-exist. If they found standard flu, they did not test for covid.

This recent JAMA paper finds there is a significant co-infection which means some people had regular flu and covid but were not diagnosed for covid. I haven't downloaded the paper to find the percentages, and this study is limited to a small region.

Not being snarky in the least here, but what it THE mainstream story? I am unable to even pretend that there is a single coherent narrative.

You'd think if there was a single flawed-by-being-too-cautious (this is what I'm assuming TC believes based on his comments RE: epidemiologists), big business would be fumbling all over themselves to fan the "Open the economy up!" flames. Maybe it's happening beyond my field of vision.

I suspect it is referring to the numbers. This many cases, this many deaths, this many hospitalizations.

I don't know anyone who thinks that these numbers are complete. The way I look at them is for change; if the testing criteria are unchanged, then the numbers indicate a direction.

What interests me is whether the disappearance of media organizations has severed one of the feedback mechanisms that everyone depends on. The agenda media is pretty useless, and most of the big players fit into that description. I'm surprised by some of the ignorant decisions made in lots of places, as if there isn't a way where information gets spread, or even a way of knowing that there is information that needs to be looked up. Even something as simple and basic as a journalist calling up all the hospitals and nursing homes every few days to get an indication of how things are going doesn't seem to be happening. And that type of poking about both informs and prods.

"if the testing criteria are unchanged, then ..."

And if not? It's a bog within a marsh surrounded by a swamp inside a morass. Nobody knows.

If you are aware of the uncertainties or issues, having data is still of value compared to having no data. Starting working or living with uncertainties my friend.

Having dud data may have negative value. If you've never experienced that in life you must be very young. Or blind.

Re: This many cases, this many deaths, this many hospitalizations.
I don't know anyone who thinks that these numbers are complet


The claim above and this one doesn't make sense: Your claim that the "media organizations" are derelict in getting this data.

Derek, each States Department of Health collects this data, not news organizations.

Didn't you know that state Departments of Health collect this data.

Here's an exercise: google your state department of health and look for the statistics; post below if you did not find them and disclose the state and I will find them for you.

Good luck. If you don't find it, post below. Otherwise, I will assume you found it.

In addition, to my non-economist eyes these authors seem to be trying to make the most epistemologically cautious claims possible.
The epidemiologists I work with all ‘understand uncertainty’ (whatever that means) but also have to make concrete predictions for the future which inform government policy.

I don’t mean to be snarky either, but I’m yet to see Tyler identify a specific example of a major mistake by an epidemiological group - instead there’s a lot of vague guided criticism and some zingers about GRE scores.

-+1 uselessness/dangerousness of agenda media
-example of a possible mistakes
the assertion that flight restrictions are racist
mebbe the mask guidelines?
the testing system hasn't gone too smoothly

The Center for Evidence Based Medicine has been on this for a month and a half:

I like the methods but the bounds on infection rate they find are very wide ( 0.8% to 64.5% in New York). It seems to me they didn’t consider important information , namely the proportion of symptomatic people that actually gets a test.
I think it’s low. Perhaps they thought it was unreliable but I believe they should have investigated this more Consider this: I quote :
“Now, most patients in New York City will not be given a COVID-19 test unless they’ve been hospitalized – even then being tested for the virus isn’t a guarantee. Several people have already shared their accounts of being deprived of a test, despite displaying symptoms of the virus, presumably because the state must ration tests, which remain in relatively short supply.”
I think this would have raised the lower bound significantly.

I also saw lots of anecdotal evidence from ER personnel on Twitter that this was indeed the case.

Korean data seems pure.

If I’m reading their table right, their bounds for the infection fatality rate are between 0% and 8.6%... this is informative? I think we already all knew it’s not as high as 9%. What does this exercise actually teach us? Or is this noteworthy just because it was written by high status econometricians who “embrace uncertainty”?

Thanks Tyler; great paper! 

And agreed, Ram.

1. From page 17-18: 

"The bound on the probability of needing intensive care is narrow,being [0, 0.02]. The fatality rate on April 6 lies in the bound [0.001, 0.086]. It is notable that this upper bound on fatality is substantially lower than the fatality rate among confirmed infected individuals, which was 0.125 on April 6." 

An IFR somewhere between 0.1% and 8.6% ...! That's beautifully uncertain. 

I think the 0.125 comes from 16523 (fatalities)/132,547 (cases. Both numbers found via worldmeter.)

2. Re: 

It is notable that this upper18bound on fatality is substantially lower than the fatality rate among confirmed infected individuals, whichwas 0.125 on April 6." " 

- like you, maybe I find it a bit less "notable" than the authors. i.e. No one credible currently arguing CFR = IFR, or that 12.5% is even remotely plausible true IFR. 

It seems like lines drawn around 0.12% (lower bound via Benvadid Santa Clara study) and 1% (absolute highest I could find someone speculating, and this was Scott Gottlieb only hinting at it as a possibility in a twitter thread.) 

3. Maybe the meta here is that:  

a. It IS notable than we have an upper bound lower than CFR, and that we can reasonably estimate it. 

b.We're more certain than we should be when foodfighting over Santa Clara-type studies.

c. False negatives are underrated. (They bound at somewhere between 0.1 and 0.4.) 

I have only skimmed the paper, but is it not a little odd that the upper bound for fatality is so much higher than the upper bound for ICU admission? It seems implausible that 5x more people will die than will make it to the ICU.

I assume it's just beyond the scope of their model to incorporate that constraint, but it seems like important information.

It seems illogical but it could be because some people never made it to the ICU? ( died at home)

That can happen, but a factor of 5 more doesn't seem reasonable. New York City just started counting people who died at home as "plausible" Covid-19 deaths, but that includes people who died of secondary effects like neglect, heart attacks, people afraid to go to the hospital, possibly excess deaths due to increased rates of drug overdose, suicide, and other lock-down side effects. In any case, the at-home deaths are at most 50% of the hospital deaths.

Yes, something is wrong with their number. In this Jama study of 17.7k Covid-19 positives in Lombardy, 9% we’re admitted to the ICU.

The vast majority of countries have not been testing or reporting deaths outside the hospital. Belgium is one of the few countries reporting on nursing home deaths, and have found that almost half (44%) of documented COVID-19 deaths are occurring in nursing homes. In the Netherlands, in the last two weeks there is an "excess mortality" of over 4,300 people. That's just the last two weeks, and exceeds the total number of documented deaths. They also note that the death rate in institutional settings is more than 50% higher than expected (prisons, homes for the disabled, residential housing for asylum seekers, etc.). None of these deaths are going into the statistics, and that doesn't touch on those who are dying at home. In many countries, there may be close to double the actual COVID-19 related deaths than are being documented, but at this point it's a guess.

Also, regarding hospitalized patients, not every patient makes it to the ICU before they die. Some die too rapidly before an ICU bed is available, and for some it's a triage decision to save the ICU for others.

Economist has a article on total excess death vs confirmed Covid19.

Paywalled so screengrabbed - (mods here may remove I guess, if this is not on).

Some countries may record better than others. By the end of the period studied, England+Wales hit about 33% of excess deaths not attributed to Covid19, while the Dutch get to about 60% of excess deaths not attributed to. NYC and London both have a relatively low degree of under-recording in data, about 20%.

Interestingly Lombardy's total excess deaths begin trending down approximately 20th March, before Covid19 deaths did, confirming that they were in fact "half-way through the infection" (about the time Michael Levitt predicted this to be the case) and were at the peak at this time.

(I'd guess care home deaths probably only represent a fraction, perhaps 1/2 at most, of excess deaths not attributed to Covid19).

Depends on the country I think. In the US the dominant thinking among the general public seems to be that nearly every patient should be ‘given a chance’ and transferred to ICU.
In other countries (most of Europe, certainly Australia/NZ where I have worked) many of these patients (old, highly comorbid) won’t be transferred to ICU.
The ratio of icu admissions to death is going to be partly a function of this.

Not bad, but they seem to make some questionable assumptions while refraining from making some useful assumptions or a wider set of data. And of course their bounds are so wide that they pretty much tell us nothing that we don't already know or have guessed, outside of the truly outlandish figures that a few people have suggested for infection rates.

The questionable assumptions are these three:

"We have assumed that persons who recover from the COVID-19 disease become immune and, hence, cannot be infected anew. We have assumed that persons who are tested and receive a negative result are not retested subsequently. We have assumed that hospitals correctly diagnose patients and that public records correctly code causes of death."

The first one is one that we perhaps have to make for the time being, or at least for an initial model. The second one was probably appropriate a few weeks ago and maybe is still appropriate but I suspect that it's becoming less so; certainly Romer's proffered plan (granted it's as implausible as Hanson's) involves doing the opposite of that assumption. And the third is I think wildly off the mark, unless by fantastic luck IL, NY, and Italy are all coding deaths and disease the same way. But it's not a matter of "correct" vs "incorrect"; it's how they choose to measure things.

OTOH, we would surely learn more by looking at a wider set of states and countries. They do acknowledge this and if I'm understanding their point, they're treating each of IL, NY, and Italy as a population and not as a sample. So there is no statistical inference to be made, instead it's a question of accurately measuring the population statistics. To add say California and South Korea to the analysis would not provide any additional information, it would simply give them two more populations to look at.

If they did treat these as samples rather than populations, they'd have to try to model the sampling process, adding a whole layer of complexity and additional assumptions needed.

But though that's not the paper that they wrote, that's the paper that I was hoping for: one that did try to aggregate or integrate worldwide data, making appropriate corrections or weightings or deflations for data of questionable quality and accounting for the differences between say CA and NY.

That is of course a huge ask, and quite possibly our data are of inadequate quantity and quality to do the task. Manski is an example of a researcher who I think might be up to the task; in this paper they say they didn't attempt the task -- are they also telling us that it shouldn't be attempted, that such ambitious models are hopelessly ambitious right now?

For fuck's sake, give us the stupid-wide bounds, okay. But for the love of god, make a few more somewhat reasonable assumptions and show what that does to the bounds.

i think there's some misunderstanding about the 'results' in the m&m paper. the results are not intended to be read as 'look at these highly informative results'. the way to understand the ranges is as 'these are the values of the number we are interested in that are consistent with the data we have' under the monotonicity assumptions they invoke.

if you calculate the ranges that would result in the absence of the monotonicity assumptions, the difference between those and the reported results would give you information on the effect of these particular assumptions. same with the ranges resulting from other assumptions.

chuck's been pounding on the (lack of) plausibility of assumptions that produce point-identification in these kinds of problems since the late 80s. i hope that, finally, his work gets the attention it deserves . . .

people in the comments misunderstand the psychology at work here. yes, they provide very wide bounds. this work in an intermediate form. it is allowing catastrophists to nudge away from their existing position without fully acknowledging their failure. TC has been reticent to move from his mood affiliation of the “WE WILL BE ITALY IN ONE WEEK” crowd (btw, i wish someone would put together a digest of those tweets for posterity). perhaps the data was not there, or not credible enough for some. but there have been signs, clear signs, that existing case counts and states fatality rates were ridiculous.

this is not to buttress the “it’s just the flu” crowd on conservative media. they’re right for the wrong reasons, but even a broken clock is right twice a day.

i would probably, on balance, prefer MR’s considerable intellectual resources take every possible civilizational emergency at the most extreme. just in case.

but i also worry that maybe we could have used such a credible voice of moderation earlier.

this study will provide cover for “serious” people to transition to the belief that COVID is far more widespread, and far less deadly than advertised. and yes, advertised is the appropriate word.

happy saturday and enjoy your negative growth. i guess we remain stubbornly attached.

Probably. But do we really care if the screamers are able to make such a pivot? (Or "excalamation point and caps" if we prefer). SO long as, where applicable to them, there's a little humility if apologizing fairly publicly for the term "Corona Truther" and for the like of calling John Ioannadis motivated by dark money (seriously seen that posited), why not then let everyone get on with testing whether mitigation using measures short of indefinite lockdown is the more balanced approach than "permanent lock down until vaccine" suppression? If the goal is unification around a new consensus, let it happen, with the minimum of necessary public humility (no matter how much a lot more public humility might be satisfying).

"It could be really bad, or it could be not so bad at all. We don't have enough credible data to tell" is very important information.

"But at this point I think it is safe to say that the mainstream story we have been living with for some number of weeks now just isn’t holding up."

And this has been obvious for three weeks if tracking Western European countries, Japan, Korea and the U.S.

The relevant point to me is that we are finally using Bayes-induction methods. The base rate on TV or world-o-meters is the WORST case. What are my odds of dying when I get Covid? Similarly, what are my odds of dying in a fire? If severe fires reported in the news have a death rate of 10%, then my odds of dying in a fire is 10%, right? But what if learn that only 5% of fires become severe enough to be reported at all? What are my odds now? We are watching the figures and inferring a higher incidence... just like we do with shark attacks at a beach until we do the calculation of healthy visitors annually. This is the first article/paper I’ve seen that seems to understand that. Thanks for posting. Covid is just much less deadly than we thought, which means we require a new plan. I know we all hate changing narratives, but the facts changed... they do that sometimes. Trump is an idiot, but we’re going to put him back in the White House if we make it this easy.

@JP - since you understand statistics, see my post below yours. It all depends on your priors...that is how Bayes induction works, and it's not a cure all. Assume the best, and get odds for the best. Have you been infected? And if so, how many have been infected? Will C-19 burn out? Depends on the R0.

"What are my odds of dying when I get Covid? Similarly, what are my odds of dying in a fire? If severe fires reported in the news have a death rate of 10%, then my odds of dying in a fire is 10%, right? But what if learn that only 5% of fires become severe enough to be reported at all? What are my odds now? "

It's depressing how well this analogy captures what is actually happening.

From today's news, which probably belongs in the other post by it's too crowded there: "One third of participants in Massachusetts study tested positive for antibodies linked to coronavirus".

So do the herd immunity math: 1-1/R0 = 0.33, solve for R0, R0=1.49 for this to be "good news". By most estimates, Covid-19 has an R0 = 1.4-5.7, so indeed this might be good news if Covid-19 has a low R0. But is the low R0 because of the social distancing and lockdowns? If so, we need even more social distancing and longer lockdowns.

I can help them raise the lower bound on fatality rate. They have .001. But there have been 12000 deaths counted due to covid19 in Lombardy. At that fatality rate, 12mil infections in Lombardy would be expected. The actual pop is 10mil.

Note that actual deaths due to covid19 are likely to but much higher than reported.

From relatively early on I've been following this and heard a case fatality estimate of 0.6% (S. Korea) as well as experts assert that they suspected the ultimate rate will be much lower than 1-2% but higher than the flu (0.1%).

I think the obvious problem here is that the fatality rate is a social construct. By that you cannot divorce it from the state society is in. If all things are going well and you can get the standard of care, it is probably around 0.4-0.6%, few times worse than the flu. If health care is overtaxed, they are duct taping you to a ventilator someone else is using and nurses are wearing coffee masks and rubber bands as masks, the fatality rate is going to be higher.

Model this disease as a type of denial of service attack. If the web servers can cope with the traffic, you won't see anything. Otherwise the site is offline.

I see these authors as asking, what can we know about these parameters with any real confidence based on data revealed so far? Their answer is, not a lot, really.

That's useful.

Was this a similar argument,

Comments for this post are closed