Epidemiology and selection problems and further heterogeneities

Richard Lowery emails me this:

I saw your post about epidemiologists today. I have a concern similar to point 4 about selection based what I have seen being used for policy in Austin. It looks to me like the models being used for projection calibrate R_0 off of the initial doubling rate of the outbreak in an area. But, if people who are most likely to spread to a large number of people are also more likely to get infected early in an outbreak, you end up with what looks kind of like a classic Heckman selection problem, right? In any observable group, there is going to be an unobserved distribution of contact frequency, and it would seem potentially first order to account for that.

As far as I can tell, if this criticism holds, the models are going to (1) be biased upward, predicting a far higher peak in the absence of policy intervention and (2) overstate the likely severity of an outcome without policy intervention, while potentially understating the value of aggressive containment measures. The epidemiology models I have seen look really pessimistic, and they seem like they can only justify any intervention by arguing that the health sector will be overwhelmed, which now appears unlikely in a lot of places. The Austin report did a trick of cutting off the time axis to hide that total infections do not seem to change that much under the different social distancing policies; everything just gets dragged out.

But, if the selection concern is right, the pessimism might be misplaced if the late epidemic R_0 is lower, potentially leading to a much lower effective spread rate and the possibility of killing the thing off at some point before it infects the number of people required to create the level of immunity the models are predicted require. This seems feasible based on South Korea and maybe China, at least for areas in the US that are not already out of control.

I do not know the answers to the questions raised here, but I do see the debate on Twitter becoming more partisan, more emotional, and less substantive. You cannot say that about this communication. From the MR comments this one — from Kronrad — struck me as significant:

One thing both economists and epidemiologists seem to be lacking is an awareness for the problems of aggregation. Most models in both fields see the population as one homogenous mass of individuals. But sometimes, individual variation makes a difference in the aggregate, even if the average is the same.

In the case of pandemics, it makes a big difference how that infection rate varies in the population. Most models assume that it is the same for everyone. But in reality, human interactions are not evenly distributed. Some people shake hands all day, while others spend their days mostly alone in front of a screen. This uneven distribution has an interesting effect: those who spread virus the most are also the most likely to get it. This means that the infection rate looks very higher in the beginning of a pandemic, but sinks once the super spreaders has the disease and got immunity. Also, it means herd immunity is reached much earlier: not after 70% of the population is immune, but after people who are involved in 70% of all human interactions are immune. At average, this is the same. But in practice, it can make a big difference.

I did a small simulation on this and came to the conclusion that with recursively applied Pareto-distribution where 1/3 of all people are responsible for 2/3 of all human interaction, herd immunity is already reached when 10% of the population had the virus. So individual variation in the infection rate can make an enormous difference that are be captured in aggregate models.

My quick and dirty simulation can be found here:
https://github.com/meisserecon/corona

See also Robin Hanson’s earlier post on variation in R0. C’mon people, stop your yapping on Twitter and write some decent blog posts on these issues. I know you can do it.