Epidemiology and selection problems and further heterogeneities

Richard Lowery emails me this:

I saw your post about epidemiologists today.  I have a concern similar to point 4 about selection based what I have seen being used for policy in Austin.  It looks to me like the models being used for projection calibrate R_0 off of the initial doubling rate of the outbreak in an area.  But, if people who are most likely to spread to a large number of people are also more likely to get infected early in an outbreak, you end up with what looks kind of like a classic Heckman selection problem, right? In any observable group, there is going to be an unobserved distribution of contact frequency, and it would seem potentially first order to account for that.

As far as I can tell, if this criticism holds, the models are going to (1) be biased upward, predicting a far higher peak in the absence of policy intervention and (2) overstate the likely severity of an outcome without policy intervention, while potentially understating the value of aggressive containment measures.  The epidemiology models I have seen look really pessimistic, and they seem like they can only justify any intervention by arguing that the health sector will be overwhelmed, which now appears unlikely in a lot of places.  The Austin report did a trick of cutting off the time axis to hide that total infections do not seem to change that much under the different social distancing policies; everything just gets dragged out.

But, if the selection concern is right, the pessimism might be misplaced if the late epidemic R_0 is lower, potentially leading to a much lower effective spread rate and the possibility of killing the thing off at some point before it infects the number of people required to create the level of immunity the models are predicted require.  This seems feasible based on South Korea and maybe China, at least for areas in the US that are not already out of control.

I do not know the answers to the questions raised here, but I do see the debate on Twitter becoming more partisan, more emotional, and less substantive.  You cannot say that about this communication.  From the MR comments this one — from Kronrad — struck me as significant:

One thing both economists and epidemiologists seem to be lacking is an awareness for the problems of aggregation. Most models in both fields see the population as one homogenous mass of individuals. But sometimes, individual variation makes a difference in the aggregate, even if the average is the same.

In the case of pandemics, it makes a big difference how that infection rate varies in the population. Most models assume that it is the same for everyone. But in reality, human interactions are not evenly distributed. Some people shake hands all day, while others spend their days mostly alone in front of a screen. This uneven distribution has an interesting effect: those who spread virus the most are also the most likely to get it. This means that the infection rate looks very higher in the beginning of a pandemic, but sinks once the super spreaders has the disease and got immunity. Also, it means herd immunity is reached much earlier: not after 70% of the population is immune, but after people who are involved in 70% of all human interactions are immune. At average, this is the same. But in practice, it can make a big difference.

I did a small simulation on this and came to the conclusion that with recursively applied Pareto-distribution where 1/3 of all people are responsible for 2/3 of all human interaction, herd immunity is already reached when 10% of the population had the virus. So individual variation in the infection rate can make an enormous difference that are be captured in aggregate models.

My quick and dirty simulation can be found here:

See also Robin Hanson’s earlier post on variation in R0.  C’mon people, stop your yapping on Twitter and write some decent blog posts on these issues.  I know you can do it.


Yes! These nonlinear effects hobble both simple plans (open or closed) and simple models of compliance (those looking at "degree" of agreement).

The first problem is the data problem. The incidents of occurrence are higher than is being reported. There are strategies designed to insure the data is understated. For example Trump has excused corporations from providing timely and accurate data. Deaths which are not tested as virus are not recorded as virus also understating the occurrence. Reliving on the federal data base in making decisions is to use bad data. There are better data bases not infected with the virus of manipulation. The Atlantic Constitution just repeated on the rank corruption the Governor has been utilizing in Georgia. The official base is understated partly because testing is nowhere it needs to be and various parties are working to insure it stays that way

It's true that in normal times we don't worry too much about what other people die of, especially far away.

Suddenly it would be nice to be a full autopsy nation.

("The percentage of deaths for which an autopsy was performed declined more than 50 percent from 1972 through 2007, from 19.3 percent to 8.5 percent..")

Now regress “likelihood of autopsy” on the murder rate and age of death. And now you understand why it’s declined.

Or not, statistics are clearly out of your depth.

we bet the fello/as bold claim that
"herd immunity is already reached when 10% of the population had the virus."
is not gonna reproduce

That's not his point. He never claimed that his model was accurately calibrated; just that most models ignore this effect and plugging in some (probably not correct, but not obviously completely crazy) values for it has a huge effect. Also, herd immunity also doesn't meant that only 10% of people get infected, just that 10% is the inflection where things switch from steadily growing to starting to die out. The actual portion of the population infected in the end would be (potentially significantly, e.g. several times) larger depending on various factors.

His point simply boils down to the people most likely to spread it (widely) are probably also the people most likely to catch it early and thus R should fall quicker over the course of the epidemic then a simple SIR models would predict and the total population that ends up infected will also be lower then SIR models predict. The more this varies among the population the stronger this effect.

This thread describes a very interesting simulation from a mathemetician, using connected graphs, that basically explores the same ideas about heterogeneity in R0.


He also claims that we observe lower total infection rates in the real world than would be expected from simple SIR models.

Also excellent.

On a side note, the difficulty that this mathematician encountered when he tried to put his argument on ArXic, see

Wow, that thread is from March 22, before it was obvious that things won't blow up. That thread should have had way more visibility.

what am I showing you


ask anyone who turns 102 this year

Now that Tyler has outed himself as an anti-elite bad person, we can ignore what he has to say.

Blindsided again!

From what has been afflicted on the World, I wonder if epidemiologists, the pandemic professoriate, are not akin to Fed economists.

We either need 9,000 more of them or we need to fire all of them and start over.

This alleged overestimation makes a lot sense to me, now that I see explained. Of course, I did not think to that myself. But I am not an epidemiologist, let alone a smart one.

However, it seems to me that this theory, if it holds, it might be over my intelligence, but it is still basic enough to be considered by at least one member of the legions of epidemiologists that existed since the inception of epidemiology, whenever that was.

What can explain that the models seem not to consider this apparently very important consideration? Might it be that overestimating is actually a feature, not a bug? It makes sense, considering the incentives of the epidemiologists. The more serious the issue, the more they become important in the eyes of society, and the more the budget of their departments increase. Again, might it be just a Public Choice issue?

Overestimation is also a feature of the climate models. There's a tendency to make the situation look worse. It justifies your paycheck some. Why do we need so many climate scientists if the situation is under control

Has that ever made sense? Total revenues for the oil and gas drilling sector were $3.3 trillion in 2019, but somehow the financial incentives were *still* with environmentalists?

No, it sounds like the answer is somehow making it out in spite of financial incentives.

Very obvious incentives running the other way.

The environmentalists have their own incentives. Their careers, their reputations, their convictions. Regardless the model predictions run hot vs the actual warming

Now explain why the Arctic climate models have been so flawed, as the actual reported data varies considerably from more comforting consensus predictions involving a longer time scale for what has been observed over the last decade.

Climate science modelling can be deeply flawed, and the Arctic is a prime example of just how unreliable it is.

That was good? It just made me feel old and tired. When faced with a hard, multivariate, prediction problem .. just name your political enemies? I guess, if you want to tell us who you are.

As far as "the Arctic as a prime example" .. it's a region, not a model, for goodness sake. The number of models pertaining to the region is unbounded.

Old and tired x 2.

In this thread anonymous believes any phenomenon only has financial incentives that impact one party and are unidirectional.

Groundbreaking, if he’s right every firm can increase their prices to infinity tomorrow.

Also he’s solved corruption and the principal agent problem in one response.

Brilliant as always

"The more serious the pandemic, the more epidemiologists become important in the eyes of society, and the more the budget of their departments increase."

A hammer wants a nail. An epidemiologist wants a shutdown.

>I do see the debate on Twitter becoming more partisan, more emotional, and less substantive.

This coming from a solidly-partisan Tyler Cowen?

I can only conclude that you've realized you are losing.

I don't know if Tyler Cowen is solidly-partisan but I agree with him that the debate is becoming more partisan and less substantive, and with you that it is because one side of the debate (the side in favor of agressive lock-down) is realizing it is losing.

Scientifically, I never saw much substance
coming from this side of the debate (it was mostly authority argument, emotional link to articles describing horrible deaths or "evidence that can't be made public"). Now that evidence is coming that everywhere the mortality rate is much lower than what supposed to justify the lock-down
and the fact that in most places (like in NYC) the health care system did not come even close to saturation, destroying the whole "flattening-the-curve" argument, I consider the scientific debate close (doing pure math is more interesting anyway). It remains the political debate, and of course it is more emotional and less substantive.

Please stop being partisan. The epidemiologists became irrelevant when policy makers failed to heed their advice 6 weeks ago. NZ heeded that advice and is heading to zero virus in two weeks (with a death rate of below 0.5%). Do you not understand that you are seeing deaths and economic disruption due to US failure of goverance (not science) arising from partisan behaviours?

"I do not know the answers to the questions raised here"

Collect data and test people. It really is that simple in places like South Korea or Germany, and these days, Italy.

And ignore twitter completely, of course. Along with people whose models are not based on any actual data, not that an absence of data appears to be a problem for economists.

And as a final tip, anyone using the term herd immunity without reference to a vaccine is easily dismissed.

Please grace us with why herd immunity can not be used without referencing a vaccine.

I'm here to learn.

He’s an expert, he was fired from the GMU public relations department for incompetence 30 years ago.

Now he’s here to share his wisdom

On Twitter: Over the past week or two I've looked at Twitter more, to see if there's anything useful I'm missing. I've reaffirmed that Twitter is awful -- just noise and bias-confirmation. I can understand how it's addictive, with a steady stream of novelty with a low bar to accessing it, but the lack of substance is really obvious, and the net result is simply to increase anxiety or cocoon people into bubbles. The only good things on it are (1) pointers to articles and blog posts; (2) tweets that are basically just images, for example of interesting foods; (3) glimpses into the psyches of groups that one would never intersect with, for example odd political fringes of different countries. I'm not actually sure #3 is actually insightful, rather than oddly entertaining. There are a handful of twitter threads that have something vaguely approaching the structure and depth of a real argument, but these just highlight how poor twitter is for this sort of layout, and I really can't understand why the tweet isn't instead: "here's a link to a real post or article."

I get a little disgusted when someone posts a fifteen-tweet thread to make their argument, using the 1/15, 2/15, 3/15 numbering system and all.

Good things blogs are dead and newspapers are gone.

The whole twitter thing is funny. It makes me think of carrying on a debate by telegraph.

First 10 word comment. STOP. Second 10 word comment. STOP.

10 word reply. STOP.

It is intrinsically not a medium for any serious discussion.

I agree. I used to think Twitter was some new and possibly useful type of writing. But we've been writing for a few millennia now and maybe we've already figured out what works. Almost the only time I get value out of Twitter is when there is a multi-tweet post that takes enough time to make a real argument (versus send out a telegram). And what is a multi-tweet thread but a BLOG ARTICLE that is unfortunately chopped into little pieces? I give up. Trying to learn from Twitter is like trying to learn from skywriting. (And like mkt42 below I sometimes go to it when sent by someone writing a coherent blog post -- but again, this just shows the weakness of Twitter, if one has to rely on someone else to "curate" it.)

Yes, I never go to twitter, with but two exceptions.

If someone such as Tyler links to it, then thanks to his selection/curation (as well as my estimation of whether I should click or not, I do so maybe 1/3 of the time) there's at least a 50% chance I'll find some interesting tweets. Without that, the chance of finding something worthwhile is less than 1%.

Once or twice a day twitter generates an email to me with tweets that it's algorithms think that I might find interesting. After perhaps a decade it's finally become semi-decent at that. I'll click on perhaps 10% of their proffered tweets, and be satisfied perhaps half the time.

What continues to drive me nuts is when someone, especially the people who are allegedly good tweeters, posts a tweet that says "read this, it's great" and then a link -- with no explanation of why it's great. I've learned not to bother clicking on those, unless I have nothing better to do with my time. Even if it's a person who generally posts quality tweets, that link is usually disappointing. When they provide some description of what it's about, I can better decide if I want to click on it or not.

Your last paragraph needs to be highlighted. Out of principle, only click when it is evident a person has put effort into their tweet.

I did a small simulation on this and came to the conclusion that with recursively applied Pareto-distribution where 1/3 of all people are responsible for 2/3 of all human interaction, herd immunity is already reached when 10% of the population had the virus.

interesting. Maybe he can explain this in a conference of "experts" who all share their similarly quick-and-dirty Excel macros.

It should be obvious that it would be moronic to make policy decisions based on these hypotheses, but the experts are so in love with their models that I fear someone might go ahead and push for "let it rip" herd immunity based on this.

Sigh. Endless September theme. Seems like people are so stupid about numbers. Only I know. No man is an island, except Ray Lopez...

Let's do the math (again) class: Critical threshold (%) for herd immunity = 1 - 1/R0 Hence, if R0 = 2.2, then 1-1/2.2 = 55%. But if R0 = 5.7, as claimed by the paper below*, herd immunity = 1-1/R- = 1-1/5.7 = 82%, much higher.

Hence if due to 'social distancing' (or heterogeneous groups of people, some of them 'spreaders' and some 'stay at home wall flowers' same thing) R0 = 1.1, then "herd immunity" results from a mere 10% of the population being infected. Great!! But it presupposes social distancing takes place for as long as it takes for herd immunity to kill off the C-19 virus, and that might be six months or longer?! Not so great. On the other hand, if indeed R0= 5.7 for C-19 virus (without any social distancing) then a gut-wrenching 82% of the population must get infected before C-19 dies off. And that means grandma dies.

Got it now? Somehow I doubt it...

*paper argues that R0 = 5.7 without any social distancing. https://wwwnc.cdc.gov/eid/article/26/7/20-0282_article - Volume 26, Number 7—July 2020 High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus 2

"I did a small simulation on this and came to the conclusion that with recursively applied Pareto-distribution where 1/3 of all people are responsible for 2/3 of all human interaction, herd immunity is already reached when 10% of the population had the virus."

I don't follow the math of why herd immunity would be reached at 10% under these assumptions.

The possibility that only 10% of the population could get COVID is less likely to lead to let-it-rip behavior than the perceived eventuality that 70% of the population will get it. You doofus.

This model doesn't actually argue for 'let it rip' behavior; indeed it depends a bit on the parameters but if anything it argues for the opposite.

It's the standard models that actually support "let it rip" more strongly because restrictions don't have that strong of effect on the end total number of people infected; it just takes longer. So you either drive it down to (near) extinction; justify it all on keeping hospitals operational and saving lives that way; or as a delay for better treatments.

Here, in addition, early restrictive action limits the spread while the people most likely to spread it widely (being themselves among the early ones affected) have it, and since R falls more quickly over time with this model that leads to a more substantial reduction in the total number of people infected. It argues more that policies should start very restrictive then ease up slowly over time (though the exact policies and how they interact with various populations matter).

"One thing both economists and epidemiologists seem to be lacking is an awareness for the problems of aggregation"

Kronrod's (not Kronrad's) post was fine, although economists would probably call it a problem of heterogeneity rather than of aggregation but that's a minor semantic quibble.

The problem with this call, not just by Kronrod but by innumerable commenters and pundits, for corrections for heterogeneity, confounding factors, covariates, etc. is that in the early stages of the epidemic we lack the data to accomplish this sort of analysis.

We can try to make all sorts of nice upgrades of our models, but we quickly find that our data sets have insufficient observations or missing variables or both.

We can make some crude and obvious observations and corrections for age: older people are more vulnerable. Maybe gender: males seem to be more vulnerable, but is that due to gender or to other variables?

But only now are we starting to get data that allows us to go beyond that with much confidence. County level data in the US is probably still not useful except for finding that the virus started in big cities and spread elsewhere. Even country level data is questionable -- what was different in Italy? Can China's data be trusted?

One small advantage that economists have: the importance of distinguishing between endogenous vs exogenous variables is built into their thinking. Most analysts recognize that R0 is an endogenous variable and not a constant, but economists are more likely to build models that attempt to analyze its determinants (note again however the limitations of the data) and to recognize the limitations of models that utilize R0 but that lack a good way of determining what the value of R0 will be.

"Even country level data is questionable -- what was different in Italy? Can China's data be trusted?"

Given the Trump appointed administration fires those who "tell the truth", like the guy in charge of people running a big boat, can any US data be trusted?

Trump denies the source of outbreaks in the US preceded his "China ban" and the worse strain that came from Europe before his "EU ban" because it has open borders between states like the US has open borders between States. A medical company meeting in Boston, tourists in Florida and Lousiana were major sources of shedding outbreaks around the US.

"The" CDC has documented a couple of cases, a person going to Chicago for a birthday party and funeral, the people who traveled after the Biogen conference, a lavish party in a NYC suburb, but has not organized distributed test capacity into building a coherent database on infections and spread.

Instead the "data" comes from a number of independent groups long involved in tracing infection spread, or public interest groups building their own data bases because the CDC, Trump, and most State governments aren't.

Of course, data that is comprehensive requires a massive and very costly deep State, requires paying money to workers to work to do testing, to write reports, to collect reports, to analyze reports, to follow up on questions the analysis triggers. When you have a ten thousand people collecting data, say for the mandated vital records, people both want more data, want more action taken from the data, so that 10,000 workers grows to 100,000 workers and all the users of the data demand the data continue and be more accurate requiring paying more workers.

Eg, to prevent waste, fraud, abuse, the "death" vital record must be reported to the Federal government quickly. But that data is only name, birth info, social security number, place of death. But that data does not go to the CDC, but to part of Treasury. The CDC gets death data from State vital records reports as much as 15-18 months after.

People Trump appointed want data collection privatized so it can be sold for profit. Notably weather data, but private for profit vital records would be both a source of power and profit.

Nothing in that post contributed to this conversation. If you’re goal is to be partisan, please visit your choice of Fox/MSNBC. Thanks.

Your point is valid for situations where we lack data. But this case is not about a lack of data. It is about an inherent bias of the standard models as they systematically overestimate the infection rate at early stages of a pandemic due assuming homogeneity. Yes, we do not know how heterogeneous R is exactly, but that does not justify working with the most extreme implicit assumption of total homogeneity.

If nothing else it should inform the uncertainty range; and it does have policy implications. It means that a herd immunity approach is more viable, but that it should be accomplished with initially strong restrictions that relax over time rather than any kind of "let it rip" scenario (at least assuming that the restrictions have a homogeneous effect, which of course they don't so the restrictions and their effect do need some modeling assumptions to for their heterogeneity). There is of course heterogeneity in outcome of being infected as well to consider. I think it does make something like that British paper suggesting that when the lockdowns start to be lifted to do so by age band (e.g. 20-30 year olds that live alone or with other 20-30 year olds being allowed out of lockdowns first) makes sense, though I'm not sure it's feasible from a policy perspective.

Which is an extremely simple question that the heterogeneity modellers should have no problem answering. Schools were in session until the lockdown, and children certainly spread the virus, so it should be no problem for such models to already provide answers to how much schools are involved in spread.

And if such models are unable to provide any insight into elementary or high school spread, then maybe it might make sense to gather more data on a previously unknown virus instead of wasting time playing with models.

We need to know a lot more about the demography and lifestyles of the infected versus the non-infected: occupations, hobbies, family composition, commuting methods, work space density, meeting frequency, store shopping frequency, and the like. Collecting this ought to be part of the flow of getting tested.

Right. This is very important to reopening the economy.

For example, what is the infection / hospitalization / death rate among flight attendants? Is it higher or lower than for other occupations? If we knew the flight attendant rate, we could approximate the relative risk of being an airline passenger, which could help us predict the future of airlines, which could help investors make money on the stock market.

Ro does have a large variance which most models ignore for simplicity. There are the super spreaders. The observation that if you're more likely to be infect others, you're more likely to be infected yourself is a good one.
There is probably a lower limit to the size of an infection cluster below which it stops and doesn't propagate.
There are hermits who infect no one. In the middle are a bunch of average people who still have to go to school, to work, to church, to a gym class, go out to lunch and might see a movie, take a trip and the like.
Many clusters seem to start from an event and I am not sure they can be modeled from a super spreader, The Gangelt Carnival near Heinsberg, the choir in Washington state, the Kirkland lifecare nursing home ( started from a birthday party there), the soccer match in Lombardy, the Church gathering in Korea.
I think a single infected man traveling on a plane infected 37 people on that same plane with SARS in 2002 and SARS may be less infectious than Covid-19
Such events will occur for the average person especially before the economy is in shutdown

"late epidemic R_0"

Isnt R_0 defined as R at t=0? How does late epidemic R_0 make sense? Do these models really keep R constant?

they don't keep R constant, but they do usually assume that it's uniform over the population (so the only reduction in R is from x% of people already having it; e.g. if R_0 was 2.0 then it would be modeled as 1.0 once 50% of people had already had the disease)

So individual variation in the infection rate can make an enormous difference that are be captured in aggregate models.
Not at equilibrium. The hospital entry and exit rates have to be stable. So hospitals expect the occasional burst in R0 and allocate reserve bed space for the expected occurrence. The hospitals act like a hedge fund.

Epidemiological models were way off when it came to asbestos. The main models had asbestos-related disease diagnoses peaking two decades ago. with a dramatic drop-off as old people exposed decades ago died off from natural causes.

As you can see from all the asbestos-attorney commercials, the model was off. Attorneys also found ways to ramp up testing in areas that were not tested before.

The bottom line is all of these models should be treated as being extremely sensitive to the underlying assumptions, which are often imperfect.

This post starts with the premise that epidemioligists do not use network analytics (which includes clustering, etc., and network structure) in analyzing spread.

Far from the truth.

Why don't you look at the research that has studied epidemics from the network perspective.

Look at econ Prof. Jackson's work at Stanford where he has analyzed health and financial contagion models.

The level of discussion and the post reflects, sadly, that persons have not looked very deeply into this subject, and simply speak from their own limited knowledge, ....

Pick up Prof. Jackson's books on the economics of networks, which has a ton of stuff on pandemics and network structures, disease transmission models and network structure. If you are just discovering this now, as the post and comments suggest, that reflects a real lack of knowledge which should be corrected.

Start with Matthew O Jackson, Social and Economic Networks, (disease transmission models and networks). Here is a link to a recent comment he made on the need to coordinate across countries to respond to this crisis: https://news.stanford.edu/press-releases/2020/03/26/coordinated-respnavirus-pandemic/

If you want to look at the models just google: epidemiological models and networks You will see the papers that epidemioligists...and some economists...write on this subject as it relates to disease and other subjects..

Likewise, it took me less than a minute on Wikipedia to find a discussion of heterogeneous models in epidemiology.
If it's going to come down to a pissing contest my guess would be that epidemiologists are better scientists than economists because they know they aren't going to get rich from it.

I’m familiar with Dr Jackson’s work.

Please point to one example of his work being used in any CDC publication/press conference or mainstream news analysis of Corona-chan. Anywhere. At all. It’s not.

Every model being pushed is based on homogeneity.

Now you understand the blog post! Yay! Learning is fun

The blog post was ostensibly about "epidemiology", not about the Straussian CDC or about mainstream news analysis.


How are you familiar with his work.

I say you are a Phoney .. You haven't read anything of his work.

Evidently you have not read his book Social and Economic Networks, because there is well over 45 pages on this subject, not including diffusion, random graph, and other models, in addition to SIR and SIS models and the specific discussions of disease transmission across all models.

A challenge to anyone reading Skeptical's comment: buy Jackson's book and read it.

Skeptical, you are a Phoney.

But, if the selection concern is right, the pessimism might be misplaced if the late epidemic R_0 is lower, potentially leading to a much lower effective spread rate and the possibility of killing the thing off at some point before it infects the number of people required to create the level of immunity the models are predicted require.
If the lockdown is focused on containing the virus on a neighborhood basis, what is the probability that all neighborhoods are immune? This is equivalent to asking the probability a hospital will go out of business. This is a tough one, the math guys are still reading the proofs on this. The main problem with the model is that the hospital adapts, they remove some of their reserve beds and lay off some workers as the virus decreases. It is very difficult to get absolute probabilities in adaptive system, There is no obvious arbitrage points to measure, the hospital, to reduce costs, is constantly adjusting its hedges.

Here is the only closed loop study that I'm aware of that uses real data: It's a small German town with high incidence of SARS-CoV-2 and they were able to go in and get serological data on a representative sample.

I don't follow Twitter, but do read all the pre-prints that are coming out. I've read my share of modeling papers and some of them have been spot on about the date when the case load peaked. I've read some of Hanson's posts but as with others, it's all conjecture and not worth getting in a lather about. Serological testing is the only reliable way to establish a baseline. Other countries seem to be doing this yet there is only one test deemed ready here in the US.

Link: https://www.land.nrw/sites/default/files/asset/document/zwischenergebnis_covid19_case_study_gangelt_en.pdf

I discuss a related problem with epidemiological models in section 2 of this article. Those models ignore social circle and local clustering and regularity of contacts, especially the close contacts that matter the most for Covid-19 transmission


Spreading person to person this makes sense to me. Spreading location to location it makes less sense.

People talking to Bill, the popular guy in the office, means Bill’s more likely to catch it. Bill should get sick early. We all get sick too.

Kate works in a supermarket. She is 17 and healthy. As people pay her on the till they meet her for the first time. She touches all their goods. Talks to them good naturedly. Kate gets sick and so do a lot of people. While she is sick, Tom mans the register. Tom is 18, healthy.

If the graph is just human social connections then the sparsity, with a few hubs, means the channels of infection decline quickly. The open paths close quickly in the graph. If the graph is more dense, like shared use of city space, and the least connected nodes all still have at least several connections, then there is always another path. Especially if immunity only last 3-24 months and the virus manages to stay alive in the meantime.

I like the thinking behind the graph model. To what extent can locations be nodes, either through indirect contact (recycled air, touching the same surface), or through recycled personnel (everyone on the counter gets sick, but we always replace them)?

If the graph is just human social connections then the sparsity, with a few hubs
NYC those hubs include half the population riding packed subways to work for weeks. Your social networks do not show up until the mass meetings are dispersed simply because the hospitals do not have the liquidity to cover the massive gatherings and the social networks.
Hospitals are like our Congress, trying to cover mass subsidies to the states and also cover small regular payments to social networks, not enough liquidity and Congress cycles. Hospitals will cycle until the mass gatherings are hedged one way or the other; and it seems they have.

New York City largely exists as a giant agglomeration in order to facilitate face to face contact among national and global elites.

Nice to see some epidemiological thinking rather than idiocy in the weeds of mathematical modeling. I guess you already realised you have explained why infection spread in a rural hinterland is slow even with poor "social distancing" in comparison to a dense metro area even with good social distancing.

I don't see why you'd take that email seriously enough to broadcast.

"It looks to me like the models being used for projection calibrate R_0 off of the initial doubling rate of the outbreak in an area. ... the models are going to (1) be biased upward, predicting a far higher peak in the absence of policy intervention"

I don't see how the author gets to "far higher". If he thinks R0 naturally becomes 2 rather than 3, then this means herd immunity at 50% rather than 67%. Is the latter "far higher"? Not relative to what makes this relevant, namely things like hospital capacity. Both would mean far more demand than availability. Lombardy confirmed cases are at .5% population, I think. Maybe that's really off and it is like 2%, 3%, etc. Maybe we have more capacity. Still, the change in modelling hasn't made any real difference with respect to the problem that I can see. Does the author think R0 magically goes to <1 naturally *at less than say 5% of pop exposed*??? He should say so, if he thinks this is any remote possibility. I don't see how.

"The epidemiology models I have seen look really pessimistic, and they seem like they can only justify any intervention by arguing that the health sector will be overwhelmed, which now appears unlikely in a lot of places. "

This seems like the obvious and well-publicized mistake: intervention worked, and so you say it wasn't necessary. Does the author think that this appears unlikely due to some natural process? What could it be? Or does he think it unlikely because of social distancing policy? In which case, this unlikelihood is not reason to doubt the need for it.

"The Austin report did a trick of cutting off the time axis to hide that total infections do not seem to change that much under the different social distancing policies; everything just gets dragged out."

This misportrays uncharitably the obvious policy goals, which are to slow, develop test and trace, open with tracing, and control it *until a vaccine*. Obviously, the goal is then that total illness infections are drastically reduced relative to not intervening.

"But, if the selection concern is right, the pessimism might be misplaced if the late epidemic R_0 is lower, potentially leading to a much lower effective spread rate and the possibility of killing the thing off at some point before it infects the number of people required to create the level of immunity the models are predicted require. This seems feasible based on South Korea and maybe China, at least for areas in the US that are not already out of control."

I'm not sure what "feasible" means. South Korea clearly has testing and tracing capacity that we do not, which is why we need a shutdown to develop it. Perhaps the author thinks it important that this would be possible if we could go back in time and prepare at all. Or perhaps he means theoretical possible. Who would deny these things.

Overall, this seems uncharitable and confused.

"Lombardy confirmed cases are at .5% population, I think. Maybe that's really off and it is like 2%, 3%, etc. "

A lot of people think it's way higher then that. We'll probably know fairly soon from antibody testing, but I think most estimates put it at least 10 to 1 confirmed to actual (so 5%) and some evidence suggests it could be significantly higher then that.


As the coronavirus tears through the country, scientists are asking: Are some people more infectious than others? Are there superspreaders, people who seem to just spew out virus, making them especially likely to infect others?

It seems that the answer is yes. There do seem to be superspreaders, a loosely defined term for people who infect a disproportionate number of others, whether as a consequence of genetics, social habits or simply being in the wrong place at the wrong time.

But those virus carriers at the heart of what are being called superspreading events can drive and have driven epidemics, researchers say, making it crucial to figure out ways to identify spreading events or to prevent situations, like crowded rooms, where superspreading can occur.

Just as important are those at the other end of the spectrum: people who are infected but unlikely to spread the infection.

Distinguishing between those who are more infectious and those less infectious could make an enormous difference in the ease and speed with which an outbreak is contained, said Jon Zelner, an epidemiologist at the University of Michigan. If the infected person is a superspreader, contact tracing is especially important. But if the infected person is the opposite of a superspreader, someone who for whatever reason does not transmit the virus, contact tracing can be a wasted effort.
At this point we have identified the super spread scenarios in NYC and set aside reserves for them. The the superspreaders; ER rooms, jails, mass transit,etc etc.

I am going to take a wild guess hear, I think the remaining super spreaders are exactly those with a predisposition for an immune overreaction. These folks do not have the right antibody and the virus can thus proliferate, generating both macrophages and spew into the environment.

"C’mon people, stop your yapping ...": I thought you Dems said "C'mon, man". Or is that only Dems who are slowly fading into dementia?

"See also Robin Hanson’s earlier post on variation in R0."

Was that really about variation? I thought it was really about *persistence* of R0 - you could have societal R0 well below 1, but with one or more sub-populations with an R0 significantly above 1, that was fixed, after awhile you could still end up with in increase in cases.

But maybe I just didn't understand the math properly.



Kind of ex post. In the last couple of weeks Dr. Fauci learned great deal about the virus, leaned it at the same rate as the trials are happening.

We know the disease development pretty well folks, now, and yes, I we could time travel and take today's knowledge with us. I have a ticket on the Wayback Machine, tomorrow your past will be reversed.

Northerners live in densely populated urban areas, but aren't all that sociable. Southerners live in sparsely populated rural areas, but are very sociable. Do we demonize northerners or southerners? I'm sorry to say this, but sociable southerners are being sociable this Easter. Expect a spike in covid cases.


World Health Organization Special Envoy Dr. David Nabarro warned Sunday that COVID-19 is a virus that will "stalk" the globe until there’s a vaccine to protect against it.

In an interview on NBC News’ “Meet The Press,” Nabarro said testing to pick up the coronavirus will be key to containing outbreaks.

"We think it is going to be a virus that stalks the human race for quite a long time to come until we can all have a vaccine that will protect us,” he said. “ There will be small outbreaks that will emerge sporadically and they will break through our defenses.”
The common cold is not being eradicated. Instead the docs are going to get the Nobel for understanding the whole variety of coronas, which is what the docs were trying to do in the Wuhan lab. Basically the docs are gearing up to treat the severely allergic and the Nobel prize handed to the docs that develop a allergy test for this virus, and a super Nobel if they develop an allergy test for all classes of corona.

Perhaps as importantly, there are no nursing homes in those models. Nursing homes have been the source of like 1/3 - 1/2 deaths all around, and the most obvious measure for reducing fatalities drastically is curbing the spread to and within them

We can and sometimes we do write posts about this: https://thezvi.wordpress.com/2020/04/07/on-r0/

10% seems too high because some places also doing distancing are well past that, but I do think 25% might do it.

Being socially high-contact may not be constant over time within individuals, i.e. the 33% of individuals responsible for 66% of the social contact may not be fixed. The simulation that shows 10% number seems to assume this.

To the extent the pandemic is disruptive to the labor market, you may have individuals substituting in and out of high-contact jobs.

Yes! It's easy for an armchair analyst to come up with a "superior" model that accounts for ... whatever's being left out of the current models. Such as heterogenous R0 values.

But as you state, those R0 values are not fixed for a given person. They can, and probably will, change their behavior -- but in ways that might not be predictable.

So the allegedly superior models that incorporate heterogeneity are themselves inadequate.

But this is nothing new, it's true for all sciences especially the social sciences. The models always leave something out, that's why they're models and not reality. Or as Henri Theil said, "models are to be used, not believed".

Good stuff! One thing I haven't seen discussed (pointers welcome) : Some of the dire predictions involved up to 80% of everyone getting infected. I'm curious how they estimate the final infection rate and the possibility of actually hitting that number. This may not be relevant, but looking at the data on influenza pandemics from wiki (grain of salt...blah blah blah), estimates of infection rates average 10-30 % over the past 60 years. And 20-60%. in 1889 and 1918, which seems like maybe isn't the best guidance.

You are dealing with two different notions of infection rates.
The infection rate you report for flu are *annual* infection real. That's how many people gets the flu each year. The *life-long* infection rate for flu is likely over 90% : mostly everyone gets some staring of the flu at least once in his/her life.

When people ara talking about 80% infection rate for Covid-19, they mean that eventually (except if an efficient vaccine comes early) 80% of the population will get infected. But not necessarily all this year.
Of course the spread of Covid-19 is and will continue to be much faster than a normal flu, because the population started with no immunity to that virus. But still one year is very short for infecting 80% of the population.

What about people who on 9 out of 10 days have very little contact with people, but every once in a while have a lot of contact?

We could call this the Tom Hanks Disease selection problem. If the virus starts spreading early among the highly popular, such as Tom Hanks, then its R0 will be higher than later on when it reaches the less popular.

My guess is that the political prejudices of the public health establishment, growing out of AIDS and Ebola, were not at all ready for a disease that spreads most rapidly among society's most respectable elements. Instead, most of establishment's emphasis in January and February was on preventing Stigmatization of the Marginalized.

But instead the main vectors appeared to be the highly un-marginal, like Tom Hanks, various first ladies, skiers, and so forth.

By the way, I want to thank in advance all the commenters who will write in to let us know that Tom Hanks isn't popular with them.

Thanks for your contributions. I don't know what we could do without them.

A quick search in Google scholar using "Heckman selection" and "epidemiology" as search terms returns 649 articles. It appears that epidemiologists are familiar with statistical methods from the 1970s.


As someone who only reads the pop sci literature, I’d be shocked if these considerations weren’t being taken into account. They’ve known about network effects and power law distributions with disease transmission for decades.

I returned from the Wayback machine, with Mr. Peabody.
Your past is not what your past was five minutes ago. The previous past had hundreds of million dying on the streets, mad dogs running wild, and Mel Gibson in a truck. I fixed that with a few post-it notes applied correctly and now you have a new past.

Your welcome. Glad I could save the world.

Comments for this post are closed