Category: Science

What should we believe and not believe about R?

This is from my email, highly recommended, and I will not apply further indentation:

“Although there’s a lot of pre-peer-reviewed and strongly-incorrect work out there, I’ll single out Kevin Systrom’s as being deeply problematic. Estimating R from noisy real-world data when you don’t know the underlying model is fundamentally difficult, but a minimal baseline capability is to get sign(R-1) right (at least when |R-1| isn’t small), and is going to often be badly (and confidently) wrong about that because it fails to account for how the confirmed count data it’s based on is noisy enough to be mostly garbage. (Many serious modelers have given up on case counts and just model death counts.) For an obvious example, consider their graph for WA: it’s deeply implausible on its face that WA had R=.24 on 10 April and R=1.4 on 17 April. (In an epidemiological model with fixed waiting times, the implication would be that infectious people started interacting with non-infectious people five times as often over the course of a week with no policy changes.) Digging into the data and the math, you can see that a few days of falling case counts will make the system confident of a very low R, and a few days of rising counts will make it confident of a very high one, but we know from other sources that both can and do happen due to changes in test and test processing availability. (There are additional serious methodological problems with, but trying to nowcast R from observed case counts is already garbage-in so will be garbage-out.)

However, folks are (understandably, given the difficulty and the rush) missing a lot of harder stuff too. You linked a study and wrote “Good and extensive west coast Kaiser data set, and further evidence that R doesn’t fall nearly as much as you might wish for.” We read the study tonight, and the data set seems great and important, but we don’t buy the claims about R at all — we think there are major statistical issues. (I could go into it if you want, although it’s fairly subtle, and of course there’s some chance that *we’re* wrong…)

Ultimately, the models and statistics in the field aren’t designed to handle rapidly changing R, and everything is made much worse by the massive inconsistencies in the observed data. R itself is a surprisingly subtle concept (especially in changing systems): for instance, uses a simple relationship between R and the observed rate of growth, but their claimed relationship only holds for the simplest SIR model (not epidemiologically plausible at all for COVID-19), and it has as an input the median serial interval, which is also substantially uncertain for COVID-19 (they treat it as a known constant). These things make it easy to badly missestimate R. Usually these errors pull or push R away from 1 — would at least get sign(R – 1) right if their data weren’t garbage and they fixed other statistical problems — but of course getting sign(R – 1) right is a low bar, it’s just figuring out whether what you’re observing is growing or shrinking. Many folks would actually be better off not trying to forecast R and just looking carefully at whether they believe the thing they’re observing is growing or shrinking and how quickly.

All that said, the growing (not total, but mostly shared) consensus among both folks I’ve talked to inside Google and with academic epidemiologists who are thinking hard about this is:

  • Lockdowns, including Western-style lockdowns, very likely drive R substantially below 1 (say .7 or lower), even without perfect compliance. Best evidence is the daily death graphs from Italy, Spain, and probably France (their data’s a mess): those were some non-perfect lockdowns (compared to China), and you see a clear peak followed by a clear decline after basically one time constant (people who died at peak were getting infected right around the lockdown). If R was > 1 you’d see exponential growth up to herd immunity, if R was 0.9 you’d see a much bigger and later peak (there’s a lot of momentum in these systems). This is good news if true (and we think it’s probably true), since it means there’s at least some room to relax policy while keeping things under control. Another implication is the “first wave” is going to end over the next month-ish, as IHME and UTexas (my preferred public deaths forecaster; they don’t do R) predict.
  • Cases are of course massively undercounted, but the weight of evidence is that they’re *probably* not *so* massively undercounted that we’re anywhere near herd immunity (though this would of course be great news). Looking at Iceland, Diamond Princess, the other studies, the flaws in the Stanford study, we’re very likely still at < ~2-3% infected in the US. (25% in large parts of NYC wouldn’t be a shock though).

Anyways, I guess my single biggest point is that if you see a result that says something about R, there’s a very good chance it’s just mathematically broken or observationally broken and isn’t actually saying that thing at all.”

That is all from Rif A. Saurous, Research Director at Google, currently working on COVID-19 modeling.

Currently it seems to me that those are the smartest and best informed views “out there,” so at least for now they are my views too.

COVID Prevalence and the Difficult Statistics of Rare Events

In a post titled Defensive Gun Use and the Difficult Statistics of Rare Events I pointed out that it’s very easy to go wrong when estimating rare events.

Since defensive gun use is relatively uncommon under any reasonable scenario there are many more opportunities to miscode in a way that inflates defensive gun use than there are ways to miscode in a way that deflates defensive gun use.

Imagine, for example, that the true rate of defensive gun use is not 1% but .1%. At the same time, imagine that 1% of all people are liars. Thus, in a survey of 10,000 people, there will be 100 liars. On average, 99.9 (~100) of the liars will say that they used a gun defensively when they did not and .1 of the liars will say that they did not use a gun defensively when they did. Of the 9900 people who report truthfully, approximately 10 will report a defensive gun use and 9890 will report no defensive gun use. Adding it up, the survey will find a defensive gun use rate of approximately (100+10)/10000=1.1%, i.e. more than ten times higher than the actual rate of .1%!

Epidemiologist Trevor Bedford points out that a similar problem applies to tests of COVID-19 when prevalence is low. The recent Santa Clara study found a 1.5% rate of antibodies to COVID-19. The authors assume a false positive rate of just .005 and a false negative rate of ~.8. Thus, if you test 1000 individuals ~5 will show up as having antibodies when they actually don’t and x*.8 will show up as having antibodies when they actually do and since (5+x*.8)/1000=.015 then x=12.5 so the true rate is 12.5/1000=1.25%, thus the reported rate is pretty close to the true rate. (The authors then inflate their numbers up for population weighting which I am ignoring). On the other hand, suppose that the false positive rate is .015 which is still very low and not implausible then we can easily have ~15/1000=1.5% showing up as having antibodies to COVID when none of them in fact do, i.e. all of the result could be due to test error.

In other words, when the event is rare the potential error in the test can easily dominate the results of the test.

Addendum: For those playing at home, Bedford uses sensitivity and specificity while I am more used to thinking about false positive and false negative rates and I simplify the numbers slightly .8 instead of his .803 and so forth but the point is the same.

More on economists and epidemiologists

From my email box, here are perspectives from people in the world of epidemiology, the first being from Jacob Oppenheim:

I’d note that epidemiology is the field that has most embraced novel and principles-driven approaches to causal inference (eg those of Judea Pearl etc).  Pearl’s cluster is at UCLA; there’s one at Berkeley, and another at Harvard.

The one at Harvard simultaneously developed causal methodologies in the ’70s (eg around Rubin), then a parallel approach to Pearl in the ’80s (James Robins and others), leading to a large collection of important epi people at HSPH (Miguel Hernan, etc).  Many of these methods are barely touched in economics, which is unfortunate given their power in causal inference in medicine, disease, and environmental health.

These methods and scientists are very influential not only in public health / traditional epi, but throughout the biopharma and machine learning worlds.  Certainly, in my day job running data science + ml in biotech, many of us would consider well trained epidemiologists from these top schools among the best in the world for quantitative modeling, especially where causality is involved.

From Julien SL:

I’m not an epidemiologist per se, but I think my background gives me some inputs into that discussion. I have a master in Mechatronics/Robotics Engineering, a master in Management Science, and an MBA. However, in the last ten years, epidemiology (and epidemiology forecasting) has figured heavily in my work as a consultant for the pharma industry.

[some data on most of epidemiology not being about pandemic forecasting]…

The result of the neglect of pandemics epidemiology is that there is precious little expertise in pandemics forecasting and prevention. The FIR model (and it’s variants) that we see a lot these days is a good teaching aid. Still, it’s not practically useful: you can’t fit exponentials with unstable or noisy parameters and expect good predictions. The only way to use R0 is qualitatively. When I saw the first R0 and mortality estimates back in January, I thought “this is going to be bad,” then sold my liquid assets, bought gold, and naked puts on indices. I confess that I didn’t expect it to be quite as bad as what actually happened, or I would have bought more put options.

…here are a few tentative answers about your “rude questions:”

a. As a class of scientists, how much are epidemiologists paid?  Is good or bad news better for their salaries?

Glassdoor data show that epidemiologists in the US are paid $63,911 on average. CDC and FDA both pay better ($98k and $120k), as well as pharma (Merck: $94k-$115k). As explained above, most are working on cancer, diabetes, etc. So I’m not sure what “bad news” would be for them.


b. How smart are they?  What are their average GRE scores?

I’m not sure where you could get data to answer that question. I know that in pharma, many  – maybe most – people who work on epidemiology forecasting don’t have an epidemiology degree. They can have any type of STEM degree, including engineering, economics, etc. So my base rate answer would be average of all STEM GRE scores. [TC: Here are U. Maryland stats for public health students.]

c. Are they hired into thick, liquid academic and institutional markets?  And how meritocratic are those markets?

Compared to who? Epidemiology is a smaller community than economics, so you should find less liquidity. Pharma companies are heavily clustered into few geographies (New Jersey, Basel in Switzerland, Cambridge in the UK, etc.) so private-sector jobs aren’t an option for many epidemiologists.

d. What is their overall track record on predictions, whether before or during this crisis?

CDC has been running flu forecasting challenges every year for years. From what I’ve seen, the models perform reasonably well. It should be noted that those models would seem very familiar to an econometric forecaster: the same time series tools are used in both disciplines. [TC: to be clear, I meant prediction of new pandemics and how they unfold]

e. On average, what is the political orientation of epidemiologists?  And compared to other academics?  Which social welfare function do they use when they make non-trivial recommendations?

Hard to say. Academics lean left, but medical doctors and other healthcare professionals often lean right. There is a conservative bias to medicine, maybe due to the “primo, non nocere” imperative. We see that bias at play in the hydroxychloroquine debate. Most health authorities are reluctant to push – or even allow – a treatment option before they see overwhelming positive proof, even when the emergency should encourage faster decision making.

…g. How well do they understand how to model uncertainty of forecasts, relative to say what a top econometrician would know?

As I mentioned above, forecasting is far from the main focus of epidemiology. However, epidemiologists as a whole don’t seem to be bad statisticians. Judea Pearl has been saying for years that epidemiologists are ahead of econometricians, at least when it comes to applying his own Structural Causal Model framework… (Oldish) link:

I’ve seen a similar pattern with the adoption of agent-based models (common in epidemiology, marginal in economics). Maybe epidemiologists are faster to take up new tools than economists (which maybe also give a hint about point e?)

h. Are there “zombie epidemiologists” in the manner that Paul Krugman charges there are “zombie economists”?  If so, what do you have to do to earn that designation?  And are the zombies sometimes right, or right on some issues?  How meta-rational are those who allege zombie-ism?

I don’t think so. Epidemiology seems less political than economy. There are no equivalents to Smith, Karl Marx, Hayek, etc.

i. How many of them have studied Philip Tetlock’s work on forecasting?

Probably not many, given that their focus isn’t forecasting. Conversely, I don’t think that Tetlock has paid much attention to epidemiology. On the Good Judgement website, healthcare questions of any type are very rare.

And here is Ruben Conner:

Weighing in on your recent questions about epidemiologists. I did my undergraduate in Economics and then went on for my Masters in Public Health (both at University of Washington). I worked as an epidemiologist for Doctors Without Borders and now work as a consultant at the World Bank (a place mostly run by economists). I’ve had a chance to move between the worlds and I see a few key differences between economists and epidemiologists:

  1. Trust in data: Like the previous poster said, epidemiologists recognize that “data is limited and often inaccurate.” This is really drilled into the epidemiologist training – initial data collection can have various problems and surveys are not always representative of the whole population. Epidemiologists worry about genuine errors in the underlying data. Economists seem to think more about model bias.

  2. Focus on implementation: Epidemiologists expect to be part of the response and to deal with organizing data as it comes in. This isn’t a glamorous process. In addition, the government response can be well executed or poorly run and epidemiologists like to be involved in these details of planning. The knowledge here is practical and hands-on. (Epidemiologists probably could do with more training on organizational management, they’re not always great at this.)

  3. Belief in models: Epidemiologists tend to be skeptical of fancy models. This could be because they have less advanced quantitative training. But it could also be because they don’t have total faith in the underlying data (as noted above) and therefore see fancy specifications as more likely to obscure the truth than reveal it.  Economists often seem to want to fit the data to a particular theory – my impression is that they like thinking in the abstract and applying known theories to their observations.

As with most fields, I think both sides have something to learn from each other! There will be a need to work together as we weigh the economic impacts of suppression strategies. This is particularly crucial in low-income places like India, where the disease suppression strategies will be tremendously costly for people’s daily existence and ability to earn a living.

Here is a 2014 blog post on earlier spats between economists and epidemiologists.  Here is more from Joseph on that topic.

And here is from an email from epidemiologist Dylan Green:

So with that…on to the modelers! I’ll merely point out a few important details on modeling which I haven’t seen in response to you yet. First, the urgency with which policy makers are asking for information is tremendous. I’ve been asked to generate modeling results in a matter of weeks (in a disease which I/we know very little about) which I previously would have done over the course of several months, with structured input and validation from collaborators on a disease I have studied for a decade. This ultimately leads to simpler rather than more complicated efforts, as well as difficult decisions in assumptions and parameterization. We do not have the luxury of waiting for better information or improvements in design, even if it takes a matter of days.

Another complicated detail is the publicity of COVID-19 projections. In other arenas (HIV, TB, malaria) model results are generated all the time, from hundreds of research groups, and probably <1% of the population will ever see these figures. Modeling and governance of models of these diseases is advanced. There are well organized consortia who regularly meet to present and compare findings, critically appraise methods, elegantly present uncertainty, and have deep insights into policy implications. In HIV for example, models are routinely parameterized to predict policy impact, and are ex-post validated against empirical findings to determine the best performing models. None of this is currently in scope for COVID-19 (unfortunately), as policy makers often want a single number, not a range, and they want it immediately.

I hope for all of our sakes we will see the modeling coordination efforts in COVID-19 improve. And I ask my fellow epidemiologists to stay humble during this pandemic. For those with little specialty in communicable disease, it is okay to say “this isn’t my area of expertise and I don’t have the answers”. I think there has been too much hubris in the “I-told-ya-so” from people who “said this would happen”, or in knowing the obvious optimal policy. This disease continues to surprise us, and we are learning every day. We must be careful in how we communicate our certainty to policy makers and the public, lest we lose their trust when we are inevitably wrong. I suspect this is something that economists can likely teach us from experience.

One British epidemiologist wrote me and told me they are basically all socialists in the literal sense of the term. not just leaning to the left.

Another person in the area wrote me this:

Another issue that isn’t spoken about a lot is most Epidemiologists are funded by soft money. It makes them terrifyingly hard working but it also makes them worried about making enemies. Every critic now will be reviewed by someone in IHME at some point in an NIH study section, whereas IHME, funded by the Gates Foundation, has a lot of resilience. It makes for a very muted culture of criticism.
Ironically, outsiders (like economist Noah Haber) trying to push up the methods are more likely to be attacked because they are not a part of the constant funding cycle.
I wonder if economists have ever looked at the potential perverse incentives of being fully grant funded on academic criticism?

Here is an earlier email response I reproduced, here is my original blog post, here is my update from yesterday.

Estimating the COVID-19 Infection Rate: Anatomy of an Inference Problem

That is a recent paper by Manski and Molinari, top people with econometrics.  Here is the abstract:

As a consequence of missing data on tests for infection and imperfect accuracy of tests, reported rates of population infection by the SARS CoV-2 virus are lower than actual rates of infection. Hence, reported rates of severe illness conditional on infection are higher than actual rates. Understanding the time path of the COVID-19 pandemic has been hampered by the absence of bounds on infection rates that are credible and informative. This paper explains the logical problem of bounding these rates and reports illustrative findings, using data from Illinois, New York, and Italy. We combine the data with assumptions on the infection rate in the untested population and on the accuracy of the tests that appear credible in the current context. We find that the infection rate might be substantially higher than reported. We also find that the infection fatality rate in Italy is substantially lower than reported.

Here is a very good tweet storm on their methods, excerpt: “What I love about this paper is its humility in the face of uncertainty.”  And: “…rather than trying to get exact answers using strong assumptions about who opts-in for testing, the characteristics of the tests themselves, etc, they start with what we can credibly know about each to build bounds on each of these quantities of interest.”

I genuinely cannot give a coherent account of “what is going on” with Covid-19 data issues and prevalence.  But at this point I think it is safe to say that the mainstream story we have been living with for some number of weeks now just isn’t holding up.

For the pointer I thank David Joslin.


A widely followed model for projecting Covid-19 deaths in the U.S. is producing results that have been bouncing up and down like an unpredictable fever, and now epidemiologists are criticizing it as flawed and misleading for both the public and policy makers. In particular, they warn against relying on it as the basis for government decision-making, including on “re-opening America.”

“It’s not a model that most of us in the infectious disease epidemiology field think is well suited” to projecting Covid-19 deaths, epidemiologist Marc Lipsitch of the Harvard T.H. Chan School of Public Health told reporters this week, referring to projections by the Institute for Health Metrics and Evaluation at the University of Washington.

Others experts, including some colleagues of the model-makers, are even harsher. “That the IHME model keeps changing is evidence of its lack of reliability as a predictive tool,” said epidemiologist Ruth Etzioni of the Fred Hutchinson Cancer Center, home to several of the researchers who created the model, and who has served on a search committee for IHME. “That it is being used for policy decisions and its results interpreted wrongly is a travesty unfolding before our eyes.”

…The chief reason the IHME projections worry some experts, Etzioni said, is that “the fact that they overshot” — initially projecting up to 240,000 U.S. deaths, compared with fewer than 70,000 now — “will be used to suggest that the government response prevented an even greater catastrophe, when in fact the predictions were shaky in the first place.”

Here is the full story, from StatNews, by Sharon Begley with assistance from Helen Branswell, two very good and knowledgeable sources.  Via Matt Yglesias.

To be clear, I am (and always have been) fully aware that there are more nuanced epidemiological models “sitting on on the shelf,” just as is true for macroeconomics and many other areas.  But I ask you, where are the numerous cases of leading epidemiologists screaming bloody murder to the press, or on their blogs, or in any other manner, that the most commonly used model for this all-important policy analysis is deeply wrong and in some regards close to a fraud?  Yes I know you can point to a few tweets from the more serious people, but where has the profession as a whole been?  Who organized the protest letter and petition to The Wall Street Journal?

And to be clear, I have heard this model cited and discussed in many (off the record) policy discussions, this is not just something you can pin on the Trump administration narrowly construed (though they are at fault as well).

Fast Grants, a project of Emergent Ventures against Covid-19, status update

As you may recall, the goal of Fast Grants is to support biomedical research to fight back Covid-19, thus restoring prosperity and liberty.

Yesterday 40 awards were made, totaling about $7 million, and money is already going out the door with ongoing transfers today.  Winners are from MIT, Harvard, Stanford, Rockefeller University, UCSF, UC Berkeley, Yale, Oxford, and other locales of note.  The applications are of remarkably high quality.

Nearly 4000 applications have been turned down, and many others are being put in touch with other institutions for possible funding support, with that ancillary number set to top $5 million.

The project was announced April 8, 2020, only eight days ago.  And Fast Grants was conceived of only about a week before that, and with zero dedicated funding at the time.

I wish to thank everyone who has worked so hard to make this a reality, including the very generous donors to the program, those at Stripe who contributed by writing new software, the quality-conscious and conscientious referees and academic panel members (about twenty of them), and my co-workers at Mercatus at George Mason University, which is home to Emergent Ventures.

I hope soon to give you an update on some of the supported projects.

Emergent Ventures Covid-19 prizes, second cohort

There is another round of prize winners, and I am pleased and honored to announce them:

1. Petr Ludwig.

Petr has been instrumental in building out the #Masks4All movement, and in persuading individuals in the Czech Republic, and in turn the world, to wear masks.  That already has saved numerous lives and made possible — whenever the time is right — an eventual reopening of economies.  And I am pleased to see this movement is now having an impact in the United States.

Here is Petr on Twitter, here is the viral video he had a hand in creating and promoting, his work has been truly impressive, and I also would like to offer praise and recognition to all of the people who have worked with him.


The covid19india project is a website for tracking the progress of Covid-19 cases through India, and it is the result of a collaboration.

It is based on a large volunteer group that is rapidly aggregating and verifying patient-level data by crowdsourcing.They portray a website for tracking the progress of Covid-19 cases through India and open-sources all the (non-personally identifiable) data for researchers and analysts to consume. The data for the react based website and the cluster graph are a crowdsourced Google Sheet filled in by a large and hardworking Ops team at covid19india. They manually fill in each case, from various news sources, as soon as the case is reported. Top contributor amongst 100 odd other code contributors and the maintainer of the website is Jeremy Philemon, an undergraduate at SUNY Binghamton, majoring in Computer Science. Another interesting contribution is from Somesh Kar, a 15 year old high school student at Delhi Public School RK Puram, New Delhi. For the COVID-19 India tracker he worked on the code for the cluster graph. He is interested in computer science tech entrepreneurship and is a designer and developer in his free time. Somesh was joined in this effort by his brother, Sibesh Kar, a tech entrepreneur in New Delhi and the founder of MayaHQ.

3. Debes Christiansen, the head of department at the National Reference Laboratory for Fish and Animal Diseases in the capital, Tórshavn, Faroe Islands.

Here is the story of Debes Christiansen.  Here is one part:

A scientist who adapted his veterinary lab to test for disease among humans rather than salmon is being celebrated for helping the Faroe Islands avoid coronavirus deaths, where a larger proportion of the population has been tested than anywhere in the world.

Debes was prescient in understanding the import of testing, and also in realizing in January that he needed to move quickly.

Please note that I am trying to reach Debes Christiansen — can anyone please help me in this endeavor with an email?

Here is the list of the first cohort of winners, here is the original prize announcement.  Most of the prize money still remains open to be won.  It is worth noting that the winners so far are taking the money and plowing it back into their ongoing and still very valuable work.

An econometrician on the SEIRD epidemiological model for Covid-19

There is a new paper by Ivan Korolev:

This paper studies the SEIRD epidemic model for COVID-19. First, I show that the model is poorly identified from the observed number of deaths and confirmed cases. There are many sets of parameters that are observationally equivalent in the short run but lead to markedly different long run forecasts. Second, I demonstrate using the data from Iceland that auxiliary information from random tests can be used to calibrate the initial parameters of the model and reduce the range of possible forecasts about the future number of deaths. Finally, I show that the basic reproduction number R0 can be identified from the data, conditional on the clinical parameters. I then estimate it for the US and several other countries, allowing for possible underreporting of the number of cases. The resulting estimates of R0 are heterogeneous across countries: they are 2-3 times higher for Western countries than for Asian countries. I demonstrate that if one fails to take underreporting into account and estimates R0 from the cases data, the resulting estimate of R0 will be biased downward and the model will fail to fit the observed data.

Here is the full paper.  And here is Ivan’s brief supplemental note on CFR.  (By the way, here is a new and related Anthony Atkeson paper on estimating the fatality rate.)

And here is a further paper on the IMHE model, by statisticians from CTDS, Northwestern University and the University of Texas, excerpt from the opener:

  • In excess of 70% of US states had actual death rates falling outside the 95% prediction interval for that state, (see Figure 1)
  • The ability of the model to make accurate predictions decreases with increasing amount of data. (figure 2)

Again, I am very happy to present counter evidence to these arguments.  I readily admit this is outside my area of expertise, but I have read through the paper and it is not much more than a few pages of recording numbers and comparing them to the actual outcomes (you will note the model predicts New York fairly well, and thus the predictions are of a “train wreck” nature).

Let me just repeat the two central findings again:

  • In excess of 70% of US states had actual death rates falling outside the 95% prediction interval for that state, (see Figure 1)
  • The ability of the model to make accurate predictions decreases with increasing amount of data. (figure 2)

So now really is the time to be asking tough questions about epidemiology, and yes, epidemiologists.  I would very gladly publish and “signal boost” the best positive response possible.

And just to be clear (again), I fully support current lockdown efforts (best choice until we have more data and also a better theory), I don’t want Fauci to be fired, and I don’t think economists are necessarily better forecasters.  I do feel I am not getting straight answers.

From my email, a note about epidemiology

This is all from my correspondent, I won’t do any further indentation and I have removed some identifying information, here goes:

“First, some background on who I am.  After taking degrees in math and civil engineering at [very very good school], I studied infectious disease epidemiology at [another very, very good school] because I thought it would make for a fulfilling career.  However, I became disillusioned with the enterprise for three reasons:

  1. Data is limited and often inaccurate in the critical forecasting window, leading to very large confidence bands for predictions
  2. Unless the disease has been seen before, the underlying dynamics may be sufficiently vague to make your predictions totally useless if you do not correctly specify the model structure
  3. Modeling is secondary to the governmental response (e.g., effective contact tracing) and individual action (e.g., social distancing, wearing masks)

Now I work as a quantitative analyst for [very, very good firm], and I don’t regret leaving epidemiology behind.  Anyway, on to your questions…

What is an epidemiologist’s pay structure?

The vast majority of trained epidemiologists who would have the necessary knowledge to build models are employed in academia or the public sector; so their pay is generally average/below average for what you would expect in the private sector for the same quantitative skill set.  So, aside from reputational enhancement/degradation, there’s not much of an incentive to produce accurate epidemic forecasts – at least not in monetary terms.  Presumably there is better money to be made running clinical trials for drug companies.

On your question about hiring, I can’t say how meritocratic the labor market is for quantitative modelers.  I can say though that there is no central lodestar, like Navier-Stokes in fluid dynamics, that guides the modeling framework.  True, SIR, SEIR, and other compartmental models are widely used and accepted; however, the innovations attached to them can be numerous in a way that does not suggest parsimony.

How smart are epidemiologists?

The quantitative modelers are generally much smarter than the people performing contact tracing or qualitative epidemiology studies.  However, if I’m being completely honest, their intelligence is probably lower than the average engineering professor – and certainly below that of mathematicians and statisticians.

My GRE scores were very good, and I found epidemiology to be a very interesting subject – plus, I can be pretty oblivious to what other people think.  Yet when I told several of my professors in math and engineering of my plans, it was hard for me to miss their looks of disappointment.  It’s just not a track that driven, intelligent people with a hint of quantitative ability take.

What is the political orientation of epidemiologists?  What is their social welfare function?

Left, left, left.  In the United States, I would be shocked if more than 2-5% of epidemiologists voted for Republicans in 2016 – at least among academics.  At [aforementioned very very good school], I’d be surprised if the number was 1%.  I remember the various unprompted bashing of Trump and generic Republicans on political matters unrelated to epidemiology in at least four classes during the 2016-17 academic year.  Add that to the (literal) days of mourning after the election, it’s fair to say that academic epidemiologists are pretty solidly in the left-wing camp. (Note: I didn’t vote for Trump or any other Republican in 2016 or 2018)

I was pleasantly surprised during my time at [very, very good school] that there was at least some discussion of cost-benefit analysis for public health actions, including quarantine procedures.  Realistically though, there’s a dominant strain of thought that the economic costs of an action are secondary to stopping the spread of an epidemic.  To summarize the SWF: damn the torpedoes, full steam ahead!

Do epidemiologists perform uncertainty quantification?

They seem to play around with tools like the ensemble Kalman filter (found in weather forecasting) and stochastic differential equations, but it’s fair to say that mechanical engineers are much better at accounting for uncertainty (especially in parameters and boundary conditions) in their simulations than epidemiologists.  By extension, that probably means that econometricians are better too.”

TC again: I am happy to pass along other well-thought out perspectives on this matter, and I would like to hear a more positive take.  Please note I am not endorsing these (or subsequent) observations, I genuinely do not know, and I will repeat I do not think economists are likely better.  It simply seems to me that “who are these epidemiologists anyway?” is a question now worth addressing, and hardly anyone is willing to do that.

As an opening gambit, I’d like to propose that we pay epidemiologists more.  (And one of my correspondents points out they are too often paid on “soft money.”)  I know, I know, this plays with your mood affiliation.  You would like to find a way of endorsing that conclusion, without simultaneously admitting that right now maybe the quality isn’t quite high enough.

Epidemiology and selection problems and further heterogeneities

Richard Lowery emails me this:

I saw your post about epidemiologists today.  I have a concern similar to point 4 about selection based what I have seen being used for policy in Austin.  It looks to me like the models being used for projection calibrate R_0 off of the initial doubling rate of the outbreak in an area.  But, if people who are most likely to spread to a large number of people are also more likely to get infected early in an outbreak, you end up with what looks kind of like a classic Heckman selection problem, right? In any observable group, there is going to be an unobserved distribution of contact frequency, and it would seem potentially first order to account for that.

As far as I can tell, if this criticism holds, the models are going to (1) be biased upward, predicting a far higher peak in the absence of policy intervention and (2) overstate the likely severity of an outcome without policy intervention, while potentially understating the value of aggressive containment measures.  The epidemiology models I have seen look really pessimistic, and they seem like they can only justify any intervention by arguing that the health sector will be overwhelmed, which now appears unlikely in a lot of places.  The Austin report did a trick of cutting off the time axis to hide that total infections do not seem to change that much under the different social distancing policies; everything just gets dragged out.

But, if the selection concern is right, the pessimism might be misplaced if the late epidemic R_0 is lower, potentially leading to a much lower effective spread rate and the possibility of killing the thing off at some point before it infects the number of people required to create the level of immunity the models are predicted require.  This seems feasible based on South Korea and maybe China, at least for areas in the US that are not already out of control.

I do not know the answers to the questions raised here, but I do see the debate on Twitter becoming more partisan, more emotional, and less substantive.  You cannot say that about this communication.  From the MR comments this one — from Kronrad — struck me as significant:

One thing both economists and epidemiologists seem to be lacking is an awareness for the problems of aggregation. Most models in both fields see the population as one homogenous mass of individuals. But sometimes, individual variation makes a difference in the aggregate, even if the average is the same.

In the case of pandemics, it makes a big difference how that infection rate varies in the population. Most models assume that it is the same for everyone. But in reality, human interactions are not evenly distributed. Some people shake hands all day, while others spend their days mostly alone in front of a screen. This uneven distribution has an interesting effect: those who spread virus the most are also the most likely to get it. This means that the infection rate looks very higher in the beginning of a pandemic, but sinks once the super spreaders has the disease and got immunity. Also, it means herd immunity is reached much earlier: not after 70% of the population is immune, but after people who are involved in 70% of all human interactions are immune. At average, this is the same. But in practice, it can make a big difference.

I did a small simulation on this and came to the conclusion that with recursively applied Pareto-distribution where 1/3 of all people are responsible for 2/3 of all human interaction, herd immunity is already reached when 10% of the population had the virus. So individual variation in the infection rate can make an enormous difference that are be captured in aggregate models.

My quick and dirty simulation can be found here:

See also Robin Hanson’s earlier post on variation in R0.  C’mon people, stop your yapping on Twitter and write some decent blog posts on these issues.  I know you can do it.

What does this economist think of epidemiologists?

I have had fringe contact with more epidemiology than usual as of late, for obvious reasons, and I do understand this is only one corner of the discipline.  I don’t mean this as a complaint dump, because most of economics suffers from similar problems, but here are a few limitations I see in the mainline epidemiological models put before us:

1. They do not sufficiently grasp that long-run elasticities of adjustment are more powerful than short-run elasticites.  In the short run you socially distance, but in the long run you learn which methods of social distance protect you the most.  Or you move from doing “half home delivery of food” to “full home delivery of food” once you get that extra credit card or learn the best sites.  In this regard the epidemiological models end up being too pessimistic, and it seems that “the natural disaster economist complaints about the epidemiologists” (yes there is such a thing) are largely correct on this count.  On this question economic models really do better, though not the models of everybody.

2. They do not sufficiently incorporate public choice considerations.  An epidemic path, for instance, may be politically infeasible, which leads to adjustments along the way, and very often those adjustments are stupid policy moves from impatient politicians.  This is not built into the models I am seeing, nor are such factors built into most economic macro models, even though there is a large independent branch of public choice research.  It is hard to integrate.  Still, it means that epidemiological models will be too optimistic, rather than too pessimistic as in #1.  Epidemiologists might protest that it is not the purpose of their science or models to incorporate politics, but these factors are relevant for prediction, and if you try to wash your hands of them (no pun intended) you will be wrong a lot.

3. The Lucas critique, namely that agents within a model, knowing the model, will change how the model itself operates.  Epidemiologists seem super-aware of this, much more than Keynesian macroeconomists are these days, though it seems to be more of a “I told you that you should listen to us” embodiment than trying to find an actual closed-loop solution for the model as a whole.  That is really hard, either in macroeconomics or epidemiology.  Still, on the predictive front without a good instantiation of the Lucas critique again a lot will go askew, as indeed it does in economics.

The epidemiological models also do not seem to incorporate Sam Peltzman-like risk offset effects.  If you tell everyone to wear a mask, great!  But people will feel safer as a result, and end up going out more.  Some of the initial safety gains are given back through the subsequent behavioral adjustment.  Epidemiologists might claim these factors already are incorporated in the variables they are measuring, but they are not constant across all possible methods of safety improvement.  Ideally you may wish to make people safer in a not entirely transparent manner, so that they do not respond with greater recklessness.  I have not yet seen a Straussian dimension in the models, though you might argue many epidemiologists are “naive Straussian” in their public rhetoric, saying what is good for us rather than telling the whole truth.  The Straussian economists are slightly subtler.

4. Selection bias from the failures coming first.  The early models were calibrated from Wuhan data, because what else could they do?  Then came northern Italy, which was also a mess.  It is the messes which are visible first, at least on average.  So some of the models may have been too pessimistic at first.  These days we have Germany, Australia, and a bunch of southern states that haven’t quite “blown up” as quickly as they should have.  If the early models had access to all of that data, presumably they would be more predictive of the entire situation today.  But it is no accident that the failures will be more visible early on.

And note that right now some of the very worst countries (Mexico, Brazil, possibly India?) are not far enough along on the data side to yield useful inputs into the models.  So currently those models might be picking up too many semi-positive data points and not enough from the “train wrecks,” and thus they are too optimistic.

On this list, I think my #1 comes closest to being an actual criticism, the other points are more like observations about doing science in a messy, imperfect world.  In any case, when epidemiological models are brandished, keep these limitations in mind.  But the more important point may be for when critics of epidemiological models raise the limitations of those models.  Very often the cited criticisms are chosen selectively, to support some particular agenda, when in fact the biases in the epidemiological models could run in either an optimistic or pessimistic direction.

Which is how it should be.

Now, to close, I have a few rude questions that nobody else seems willing to ask, and I genuinely do not know the answers to these:

a. As a class of scientists, how much are epidemiologists paid?  Is good or bad news better for their salaries?

b. How smart are they?  What are their average GRE scores?

c. Are they hired into thick, liquid academic and institutional markets?  And how meritocratic are those markets?

d. What is their overall track record on predictions, whether before or during this crisis?

e. On average, what is the political orientation of epidemiologists?  And compared to other academics?  Which social welfare function do they use when they make non-trivial recommendations?

f. We know, from economics, that if you are a French economist, being a Frenchman predicts your political views better than does being an economist (there is an old MR post on this somewhere).  Is there a comparable phenomenon in epidemiology?

g. How well do they understand how to model uncertainty of forecasts, relative to say what a top econometrician would know?

h. Are there “zombie epidemiologists” in the manner that Paul Krugman charges there are “zombie economists”?  If so, what do you have to do to earn that designation?  And are the zombies sometimes right, or right on some issues?  How meta-rational are those who allege zombie-ism?

i. How many of them have studied Philip Tetlock’s work on forecasting?

Just to be clear, as MR readers will know, I have not been criticizing the mainstream epidemiological recommendations of lockdowns.  But still those seem to be questions worth asking.

What should I ask Adam Tooze?

I will be doing a Conversation with him, no associated public event.  He has been tweeting about the risks of a financial crisis during Covid-19, but more generally he is one of the most influential historians, currently being a Professor at Columbia University.  His previous books cover German economic history, German statistical history, the financial crisis of 2008, and most generally early to mid-20th century European history.  Here is his home page, here is his bio, here is his Wikipedia page.

So what should I ask him?

Fast Grants against Covid-19, an extension of Emergent Ventures

Emergent Ventures, a project of the Mercatus Center at George Mason University, is leading a new “Fast Grants” program to support research to fight Covid-19.  Here is the bottom line:

Science funding mechanisms are too slow in normal times and may be much too slow during the COVID-19 pandemic. Fast Grants are an effort to correct this.

If you are a scientist at an academic institution currently working on a COVID-19 related project and in need of funding, we invite you to apply for a Fast Grant. Fast grants are $10k to $500k and decisions are made in under 48 hours. If we approve the grant, you’ll receive payment as quickly as your university can receive it.

More than $10 million in support is available in total, and that is in addition to earlier funds raised to support prizes.  The application site has further detail and explains the process and motivation.

I very much wish to thank John Collison, Patrick Collison, Paul Graham, Reid Hoffman, Fiona McKean and Tobias Lütke, Yuri and Julia Milner, and Chris and Crystal Sacca for their generous support of this initiative, and I am honored to be a part of it.

Meanwhile, elsewhere in the world (FT):

The president of the European Research Council — the EU’s top scientist — has resigned after failing to persuade Brussels to set up a large-scale scientific programme to fight Covid-19.

In contrast:

During World War II, the NDRC accomplished a lot of research very quickly. In his memoir, Vannevar Bush recounts: “Within a week NDRC could review the project. The next day the director could authorize, the business office could send out a letter of intent, and the actual work could start.” Fast Grants are an effort to unlock progress at a cadence similar to that which served us well then.

We are not able at this time to process small donations for this project, but if If you are an interested donor please reach out to [email protected].

Emergent Ventures winners, eighth cohort

Eibhlin Lim, Penang and University of Chicago.

“I interview founders from different industries and around the globe and share their origin stories to inspire the next generation of founders to reach for their own dreams. I previously shared these stories in Phoenix Newsletters, an online newsletter that organically grew to serve more than 7000 high school and university student subscribers primarily from Malaysia. In July 2018, I decided to self-publish and distribute a book, ‘The Phoenix Perspective’, which contains some of the most loved stories from Phoenix Newsletters, after learning that some of our biggest fans did not have constant access to the Internet and went through great lengths to read the stories. With the help of founders and organizations, I managed to bring this book to these youths and also 1000+ other youths from 20+ countries around the globe. I hope to be able to continue interviewing founders and share their origin stories, on a new website, to reach even more future founders from around the world.”

Carole Treston/Association of Nurses in AIDS Care

To jump-start a Covid-19 program to produce cheap informational videos and distribute them to their nurse network for better information and greater safety, including for patients.

Kyle Redelinghuys

“Right now, the main sources of data for Coronavirus are CSV files and websites which make the data fairly inaccessible to work with for developers. By giving easy access to this data more products can be built and more information can be shared. The API I built is an easily accessible, single source of Coronavirus data to enable developers to build new products based on COVID19 data. These products could be mobile applications, web applications and graphed data…The API exposes this data in JSON which is the easiest data format to work with for web and mobile developers. This in turn allows for quick integration in to any products. The API is also completely free to users.”

Seyone Chithrananda

17 year old from Ontario, wishes to work in San Francisco, he does computational biology with possible application to Covid-19 as well, Twitter here.  His Project De Novo uses molecular machine learning methods for novel small molecule discovery, and the grant will be used to scale up the cloud computing infrastructure and purchase chemical modelling software.

Joshua Broggi, Woolf University

To build an on-line university to bring learning programs to the entire world, including to businesses but by no means only.  His background is in philosophy and German thought, and now he is seeking to change the world.


There is also another winner, but the nature of that person’s job means that reporting must be postponed.

Here are previous Emergent Ventures winners, here is an early post on the philosophy of Emergent Ventures.  You will note that the Covid-19-related work here is simply winning regular EV grants, these are not the prizes I outlined a short while ago.  I expect more prize winners to be announced fairly soon.