A highly qualified reader emails me on heterogeneity

I won’t indent further, all the rest is from the reader:

“Some thoughts on your heterogeneity post. I agree this is still bafflingly under-discussed in “the discourse” & people are grasping onto policy arguments but ignoring the medical/bio aspects since ignorance of those is higher.

Nobody knows the answer right now, obviously, but I did want to call out two hypotheses that remain underrated:

1) Genetic variation

This means variation in the genetics of people (not the virus). We already know that (a) mutation in single genes can lead to extreme susceptibility to other infections, e.g Epstein–Barr (usually harmless but sometimes severe), tuberculosis; (b) mutation in many genes can cause disease susceptibility to vary — diabetes (WHO link), heart disease are two examples, which is why when you go to the doctor you are asked if you have a family history of these.

It is unlikely that COVID was type (a), but it’s quite likely that COVID is type (b). In other words, I expect that there are a certain set of genes which (if you have the “wrong” variants) pre-dispose you to have a severe case of COVID, another set of genes which (if you have the “wrong” variants) predispose you to have a mild case, and if you’re lucky enough to have the right variants of these you are most likely going to get a mild or asymptomatic case.

There has been some good preliminary work on this which was also under-discussed:

You will note that the majority of doctors/nurses who died of COVID in the UK were South Asian. This is quite striking. https://www.nytimes.com/2020/04/08/world/europe/coronavirus-doctors-immigrants.html — Goldacre et al’s excellent paper also found this on a broader scale (https://www.medrxiv.org/content/10.1101/2020.05.06.20092999v1). From a probability point of view, this alone should make one suspect a genetic component.

There is plenty of other anecdotal evidence to suggest that this hypothesis is likely as well (e.g. entire families all getting severe cases of the disease suggesting a genetic component), happy to elaborate more but you get the idea.

Why don’t we know the answer yet? We unfortunately don’t have a great answer yet for lack of sufficient data, i.e. you need a dataset that has patient clinical outcomes + sequenced genomes, for a significant number of patients; with this dataset, you could then correlate the presences of genes {a,b,c} with severe disease outcomes and draw some tentative conclusions. These are known as GWAS studies (genome wide association study) as you probably know.

The dataset needs to be global in order to be representative. No such dataset exists, because of the healthcare data-sharing problem.

2) Strain

It’s now mostly accepted that there are two “strains” of COVID, that the second arose in late January and contains a spike protein variant that wasn’t present in the original ancestral strain, and that this new strain (“D614G”) now represents ~97% of new isolates. The Sabeti lab (Harvard) paper from a couple of days ago is a good summary of the evidence. https://www.biorxiv.org/content/10.1101/2020.07.04.187757v1 — note that in cell cultures it is 3-9x more infective than the ancestral strain. Unlikely to be that big of a difference in humans for various reasons, but still striking/interesting.

Almost nobody was talking about this for months, and only recently was there any mainstream coverage of this. You’ve already covered it, so I won’t belabor the point.

So could this explain Asia/hetereogeneities? We don’t know the answer, and indeed it is extremely hard to figure out the answer (because as you note each country had different policies, chance plays a role, there are simply too many factors overall).

I will, however, note that this the distribution of each strain by geography is very easy to look up, and the results are at least suggestive:

  • Visit Nextstrain (Trevor Bedford’s project)
  • Select the most significant variant locus on the spike protein (614)
  • This gives you a global map of the balance between the more infective variant (G) and the less infective one (D) https://nextstrain.org/ncov/global?c=gt-S_614
  • The “G” strain has grown and dominated global cases everywhere, suggesting that it really is more infective
  • A cursory look here suggests that East Asia mostly has the less infective strain (in blue) whereas rest of the world is dominated by the more infective strain:
  • image.png

– Compare Western Europe, dominated by the “yellow” (more infective) strain: