Pooling to multiply SARS-CoV-2 testing throughput

Here is an email from Kevin Patrick Mahaffey, and I would like to hear your views on whether this makes sense:

One question I don’t hear being asked: Can we use pooling to repeatedly test the entire labor force at low cost with limited SARS-CoV-2 testing supplies?

Pooling is a technique used elsewhere in pathogen detection where multiple samples (e.g. nasal swabs) are combined (perhaps after the RNA extraction step of RT-qPCR) and run as one assay. A negative result confirms no infection of the entire pool, but a positive result indicates “one or more of the pool is infected.” If this is the case, then each individual in the pool can receive their own test (or, if we’re getting fancy [read: probably too hard to implement in the real world], perform an efficient search of the space using sub-pools).

To me, at least, the key questions seem to be:

– Are current assays sensitive enough to work? Technion researchers report yes in a pool as large as 60.

– Can we align limiting factors in testing cost/velocity with pooled steps? For example, if nasal swabs are the limiting reagent, then pooling doesn’t help; however if PCR primers and probes are limiting it’s great.
– Can we get a regulatory allowance for this? Perhaps the hardest step.

Example (readers, please check my back-of-the-envelope math): If we assume base infection rate of the population is 1%, then pooling of 11 samples has a ~10% chance of coming out positive. If you run all positive pools through individual assays, the expected number of tests per person is 0.196 or a 5.1x multiple on testing throughput (and a 5.1x reduction in cost). This is a big deal.

If we look at this from the view of whole-population biosurveillance after the outbreak period is over and we have a 0.1% base infection rate, pools of 32 samples have an expected number of tests per person at 0.0628 or a 15.9x multiple on throughput/cost reduction.

Putting prices on this, an initial whole-US screen at 1% rate would require about 64M tests. Afterward, performing periodic biosurveillance to find hot spots requires about 21M tests per whole-population screen. At $10/assay (what some folks working on in-field RT-qPCR tests believe marginal cost could be), this is orders of magnitude less expensive than mitigations that deal with a closed economy for any extended period of time.

I’m neither a policy nor medical expert, so perhaps I’m missing something big here. Is there really $20 on the ground or [something something] efficient market?

By the way, Iceland is testing many people and trying to build up representative samples.


Would polling affect the likelihood of false positives? It negatives?

The irony of COVID is that medical bills is the #1 cause of bankruptcy in the US so its fitting that the whole country will soon be bankrupt because of a medical emergency. Nothing could be more American.

It’s not even remotely close to the #1 cause of bankruptcy.

So it’s not ironic, you’re just incorrect.

Your battle, young Skeptical padawan, is with the American Journal of Public Health:

"... because of one reason: health-care costs. A new study from academic researchers found that 66.5 percent of all bankruptcies were tied to medical issues —either because of high costs for care or time out of work. "


Note their use of weasel words: “tied to medical issues,” not caused by. So if you have $100k in mortgage debt and $100 in medical debt, it counts as ‘medical bankruptcy’ because you had some medical debt when you filed for bankruptcy. Worse yet, studied finding such ridiculous percentages are usually based on vague survey questions like “have you experienced financial stress or lost income due to medical bills,” which most people in general might answer yes to.

In reality, medical bankruptcy accounts for around 5% of bankruptcies; your battle it seems is with the New England Journal of Medicine: https://www.nejm.org/doi/pdf/10.1056/NEJMp1716604

"The research also noted that 58.5 percent of bankruptcies were caused specifically by medical bills, while 44.3 percent were caused in part by income loss due to illness."

Better? They said "caused by"

No, it's not any better. If someone already has accumulated most of their debts from other things and a medical bill is the straw that breaks the camel's back, then the medical bill is only the proximate cause, not the primary cause.

Your battle is with the ridiculously loose definition of bankruptcy "tied" to medical issues. "This approach assumes that whenever a person who reports having substantial medical bills experiences a bankruptcy, the bankruptcy was caused by the medical debt. The fact that, according to a 2014 report from the Consumer Financial Protection Bureau, about 20% of Americans have substantial medical debt yet in a given year less than 1% of Americans file for personal bankruptcy suggests that this assumption is problematic. Clearly, many people face medical debt but do not go bankrupt. Even after correcting for overly broad definitions of “medical” expenses, the existing, widely cited evidence on medical bankruptcy is built on the fallacy that when two things occur together there is necessarily a causal relationship between them."


Who is going to collect 330,000,000 samples?

The post office!

Just lick a stamp and return the envelope...

Sure, it could be done in theory, but the turnaround would be about 6 months to a year.

I forget whether it was on the Saturday or Sunday CV Task Force press conference, but they are up to 195K tests with results and may be up to 1M by the end of the week. These are people who suspected that they were infected, but it is reported that only 10% test positive.

I don't think 195K tests = 195k people tested. I believe they run 2 tests for each person and run them in separate batches, to improve reliability.

Swabs are the current limit or so we're told.

Swabs are a constraint. PPE is as well; the collector needs to be in PPE. Less of an issue if we are collecting on a batch basis.

Self-collection requires no PPE.

My understanding is that nasal swab is not the required collection site but that collection needs to be via the nasopharyngeal cavity - i.e. the back of the throat. This is why PPE is required because most people gag or cough when a swab is put down their throat. If this is true then self-collection would not work.

Look up a picture of a nasopharyngeal swab and then tell me you could do that yourself:


He's saying that you can't do it yourself.

" self-collection would not work."

No, the point is that self-collection wouldn't work!

It's a new protocol, so would need to be validated against existing protocols to ensure the accuracy is similar. It would temporarily require increased operator time and reagents for validation, but could pay off in the future.

It might work if the CDC had field offices and localized testing ability in each zip code.

However, for tracing purposes, the number of follow up tests might be infeasible unless the same swab could be used in multiple tests. Then it boils down to a binary search problem, potentially something like log(N) early in the pandemic rather than N. If the swabs cannot be reused, you would need to re-swab everyone. In any case, swabbing rather than testing becomes a limiting constraint. What's the chance that the CDC could send an annual swab survey out like a census and collect samples in the mail? I think certain districts with ideological hatred for the federal government would mail back dog feces instead of their DNA, but if we could do this it would be very useful for an entire panel of infectious diseases.

The easiest way to implement this (household level) is also the relevant unit for tracing purposes.

+1, just having households all test using the same test would allow you to focus on infected households and cut the required tests by a significant percentage.

What percentage would refuse to participate because Gumment / Illuminati / what else are they going to do with my DNA / my freedom / it's not a swab, it's a sneaky vaccine mind control drug? I guess 40% of the population would refuse to participate.

Relatedly, are there any numbers on the false positive and false negative rates for the COVID19 tests? Some epi folks on Twitter were claiming that the FPR should be close to zero, as the PCR tests require a genetic match, but do we know anything about the False Negative rates?

The FPR could be as high as 80%! See: https://pubmed.ncbi.nlm.nih.gov/32133832/

Implications of 80% false positive would seem to be:

- if you assume the Italian deaths are true cases and tests are only 20% then fatality rates are like 40-80%. Beyond ebola.
- if you assume the Italian deaths are also 80% false positives, a lot of people died suddenly for no apparent reason that is completely unconnected with the virus.
- The growth rate is lower than we think.
- South Korean and Singaporean contact tracing would probably flatly fail to work.

You could argue that the true cases in the general population is still at the same ratios we thought it was, but 80% of the true cases are hidden. This is a real stretch though. If 80%, why not 800% etc? If completely unreliable, why in such a way to come back to exactly what we thought when we thought it was reliable?

(I get the impression that the 80% false positives is something to make the low % hospitalizations and deaths in high testers and counters "go away", to maintain very high case hospitalization rates and death rates which they've commited themselves to.... I don't want to get too into motivations but hopefully there is some critical thinking and cobsistency going on about the implications of these things, rather than "False positives for Germany, South Korea and Iceland; no false positives for North Italy. Because." inconsistencies).

I don't think that estimate is reasonable: they define true positives based on infection rates. Also, there are a lot of people who are super-spreaders (e.g. patient #31 in South Korea), and non-spreaders (there was a journal article about a couple from WA, they had close contacts with 50+ people, none of them got COVID).

Also, it doesn't help that the paper is in Chinese and not accessible from US.

My understanding if that false positive is not the problem but false negative could be as high as 40%, which would be a significant issue for the pooling idea.

Yes - this could be promising. Particularly for resource-constrained LMICs.

Here is some related analysis for pooling in the case of TB in LMICs:

And two empirical analyses:

The math is correct.
- 10.46% with 0.19557 expected number of tests per person for case 1 ( 11 and 0.01)
- 3.15% and 0.06276 respectively for case 2 ( 32 and 0.001)

It's a little bit more work, making sure it's done correctly but it seems worth it just don't let the CDC do it, haha

And yes of course, we could sink a lot of money into testing 24/7 and it would still be dwarfed by the cost of the shutdown.

yes Iceland is testing larger random samples, but the big story is the number of positives without symptoms : maybe 50% ,Natural immunity anyone ? Is that a game changer ?

Even in countries with extremely good testing, there is no suggestion that something like natural immunity has been achieved. Even the most extremely pessimistic assumptions in a place like South Korea or Germany is that less than 3% of the population has actually been infected, even assuming that every listed case represents 100 unlisted case. Currently, using that 1 to 100 ratio, Germany has a total of 3% of its population infected, and South Korea under 2%. In both countries, the health care system is already at its limits.

Now imagine increasing the number of infected people from under
3% to under 10% in the next month or two. You can believe that a number of countries like Spain or Italy already have. And keep in mind that the 1 to 100 ratio is absolutely not supported by any data, it is simply used to show how absurd the idea of natural immunity at this point.

8,961 cases South Korea / population 51 million https://www.worldometers.info/coronavirus/country/south-korea/

24,873 cases Germany / population 83 million

We have no idea how many people are infected. I would not be surprised if 30% (or more) of the population in Germany has already been infected. (For what its worth, 100% of the people I know in Germany are infected.)
And I wouldn't call such a scenario "pessimistic".

And zero of the people I know in Germany are infected.

Try using some numbers to buttress your claim of 30%, because these are the testing results from March 15 - 'The report states: “definitely more than 167,000 [tests have been conducted]. As the German Hospital Society (DKG) announced on Thursday, 167,009 samples were tested in 148 laboratories by the end of last week, of which 6540 were positive.” We interpret ‘the end of last week’ as the 15 March.' https://ourworldindata.org/coronavirus-testing-source-data

And in Germany, if someone is suspected to have the virus, the health system uses quarantining and contact tracing to try to stop the spread, though that system is on the verge of breaking down. For example, you call your doctor, say you have a fever and headache, the result is you and everyone in your household will be placed in quarantine for a minimum of 14 days or until you are tested and are negative. Again, the entire household is quarantined for 14 days, and quarantine is only lifted when covid-19 is no longer present in any household member. In other words, it isn't even 30% in people who have been exposed to someone who has the disease, even after subtracting the negative test results following recovery from covid-19.

Of course the numbers are still extremely imprecise, but it is easy to use what numbers exist to at least make realistic boundaries to speculation. 30% already infected Germany is an imaginary number, to put it mildly.

I do know people in quarantine, but have yet to learn their test results. Which I expect to be negative, actually, since having a headache and joint pain are not really strong symptoms of covid-19.

I'd almost forgotten that part of prior's shtick was pretending to be German, or to have lived in Germany, or some such.

Zero of the people that you know, know that they are infected.

Using the numbers in the comment concerning Germany, to reach that 30 percent, the ratio of confirmed cases to completely unknown cases has to be 1 to 1000.

The tooth fairy is about as likely to cause Covid19 as the idea that for every confirmed case of Covid19, there are 1000 unknown cases. Maybe you can work out the infection rate for such a disease, because it must be truly record setting, to have gone from one confirmed case in Jan. 27 to almost 30 million people on March 23, without more than 29 million cases not being even suspected.

I think there is a misunderstanding here. I was talking about natural immunity, that is pre -existing immunity to Covid-19 not immunity acquired a few days/weeks ago from Covid infection.
You test a random sample and say 50% of the positives never develop any symptoms. It’s possible they got infected but had pre-existing antibodies, so no symptoms. They would still show positive for the virus for a few days as it takes about 3 days for the adaptive immunity response to strongly kick in.
That’s a reasonable assumption, I don’t know if it’s true. The only way to truly resolve this is with a serological test ( to detect antibodies). You look for people who test negative to the virus but have antibodies to Covid.
If this number is high, it should not be hard to find.
The test must be sensitive as the antibody level might be low.
This was the hypothesis of the Israeli scientist Michael Levitt

This is called group testing (https://en.wikipedia.org/wiki/Group_testing) and is currently being done to sensitivity limits

“the Centers for Disease Control and Prevention (CDC) released updated guidelines on Monday (March 9) that allow labs to combine samples from the nose and throat into one test, halving the number of extraction kits needed.”


But that link is about pooling specimens from the same individual, not specimens from different individuals, right?

Testing multiple DNA samples at once is pretty common... there is even a name for it: multiplexing. You can actually identify the positive sample from the multiplex by chemically binding each sample to a unique signature.

I don't know whether multiplexing is currently being used for this.

A follow up: I just confirmed that there are Corononavirus test mulitplexing up to 94 samples at a time: https://www.thermofisher.com/us/en/home/clinical/clinical-genomics/pathogen-detection-solutions/coronavirus-2019-ncov/genetic-analysis/taqpath-rt-pcr-covid-19-kit.html

Thank you!

This looks like something different to me. According to an explanatory page on the same website, multiplexing is when you simultaneously test for (amplify) multiple genes in the same specimen, not when you test multiple specimens for the same gene.


It also doesn't mention "group testing" anywhere (which is the standard terminology), nor does it mention the need for re-testing individual members of the group if the group tests positive. Finally, it is being sold as a method for testing hospital patients who are likely to have the virus, which is exactly the sort of subject for which group testing is of little value. If your prior probability on a patient having the virus is p, it is useless to do group testing using groups much larger than N ~ 1/p because the group usually comes back positive. Instead, group testing is ideal for finding rare infections from groups which have a very low infection rate.

Can you help me reconcile this?

Pooling has a long and successful history. It was initially developed to rapidly test for syphilis during World War 2. The Wasserman test for syphilis is sensitive enough to detect one syphilitic sample in a large pool. To pool, take two samples from each person, pool the first and do one test. If it is positive, test each of the second samples. There is an optimal pool size that depends on the prevalence of the disease. See https://en.wikipedia.org/wiki/Group_testing for a good summary.

You may even get around narrowing down the infected. Pool people working together, send them all in quarantine (as you should) when a positive is found.

Pooling is very common in the blood collection industry. It was one of the key reasons we were able to roll out HIV and HCV nucleic acid testing (NAT) 20 years ago to test blood donations. In that setting, pooling has typically ranged from 16-24 sample pools; as noted above, if the pool is positive samples are retested individually to determine which donor was positive. I also work in clinical labs, including those now performing COVID-19 testing. In short, I can't imagine this approach being implemented except under the direst of situations . . . it just goes against the entire culture of segregating individual patient samples.

yes, there are tests which are $20 per test. IgG/IgM tests for other deceases are for $20, IgG/IgM tests for covid-19 started to be produced around globe.

checked - currently some charge $150 for such test, but some companies are ready to deliver for $10 per test, so pooling makes a lot less sense, but rapid increase in cheap tests makes sense

You'll notice nobody besides Singapore and South Korea is using antibody tests.

IN fact that is why they have been successful is stopping this.

I'm following this and the current serological tests are only 60% accurate. More needs to be done before these can be rolled out. They also require a blood draw which poses a person-person contact issue.

Too bad the Theranos project collapsed in fraud; that's the kind of approach that is needed in this pandemic.

from materials I seen

Clinical results using the COVID-19 IgG/IgM Rapid Test show:

The sensitivity of the lgM test is 87.9% (87/99) and specificity is 100%

(14/14) when compared to RT-PCR.

The sensitivity of the lgG test is 97.2% (35/36) during patients' convalescence period and specificity is 100% (14/14).

or https://www.biospace.com/article/releases/20-20-bioresponse-to-launch-rapid-coronavirus-test-kits-in-u-s-following-green-light-from-fda/

Accurate: High sensitivity (~97%) and specificity (~92%)

Would HIPAA pose a problem? The test results would impliedly release information about other people's condition.

I haven't seen any robust calls for random sampling testing. Perhaps combined with the pooling suggested above. An ongoing random sample for the virus -- along with tests for antibodies if relevant-- would allow for a much better idea of where things are at. Trying to test the entire population via brute-force -- even via pooling -- seems like overkill and a waste of resources. I dont fully understand the math above, but it seems like the number of tests he is suggesting still seem way beyond our capabilities.

There is a simple reason for no calls for random sample testing - we cannot even test reasonably suspected cases as of today. There is no spare capacity.

Basically, this is much the same as saying that we should be pulling a random sample of N95 masks to make sure they conform to the appropriate standards.

There are some reports that LA and NYC may be giving up on some testing ambitions. I am thinking that the testing regime is so out of whack at the moment, taking some of the test for random sampling would not be a great detriment.

We don't have the spare capacity today, but if we want to get to the point where we can handle this without social distancing, we will need to be ready when we do have the capacioty.

How important is testing of patients at this point? we are telling suspected cases to quarantine already. in a hospital setting, does a positive test change the protocol of care? my point being that if hospitals are already giving up on testing, and it doesn't much affect the care of patients, the informational value of immediate random sampling could far outweigh localized gains from reactive spot testing. But i legitimately do not know the value of testing in hospital right now. i can understand rigorous testing of medical workers a bit. but if patients are already overwhelming specialized infrastructure for these types of cases, and not being isolated beyond some minimum level of caution, is there excess capacity to be found in testing?

Re: iceland -- their testing shows a .86% prevalence in the general population. False positives not withstanding, and assuming at least a similar rate of infection, that would suggest at least 2.6 million infections in US already. NBA has 16 confirmed player cases apparently. that is a rate of 2.7%. So those seem like reasonable limits on least/most cases for the moment. Again, assuming any accuracy to testing.

Even the actual case count outside their sample (possibly an under-count) is about 600 recorded cases in a population of about 300,000. Which is 0.2% or about 2x Italy's official rate as of today (59,138 in a population of about 60 million), and Italy's rate itself is quite probably is an under-count by a significant factor (even 10x?).

Still, the disease would only have to be going exponential longer than cases have been recorded in Italy (52 days since first case) for another two weeks (14 days) to get to about 1%.

The Italian data have their case count not increasing for about 21 days at the start....

Should such prevalence of 1% be possible, "Bringing the hammer down to contain" and "Let's nuke the curve!" becomes sort of a joke, though efforts will still mitigate (at a cost). (And yes, this whole thing likely does become a "once-in-a-century evidence fiasco").

" If we assume base infection rate of the population is 1%, then pooling of 11 samples has a ~10% chance of coming out positive."

It should go the other way. 1.01**11 = 11.5% chance.

The most recent This Week in Virology podcast (#593) talked about pooling in order to test more people with existing tests (but not scaled at the level of the entire workforce). Their concerns were that labs would have difficulty executing the tests appropriately without cross-contamination but they didn't express any concerns about the sensitivity. That might change depending on how many people would be grouped in a single test, but RT-PCR is extremely sensitive.

Excellent idea. However, swabs to get the samples are now also limiting.

However, getting bureaucrats to adopt such an approach in a reasonable time frame appears the impossible step.

There’s an even better way to actually increase test output by polling, which we use in distributed systems. We call it shuffle sharding. Given that we have N test kits and M people, we take samples of every person and pool it in more than one test kit. The key is that no 2 persons’ sample end up in the same *set* of test kits. Do the math, this tremendously amplifies the test capacity.


Thanks for the link Jorge.

It is a very stupid idea.

The dominant cost of the COVID.19 test is not the reagent. It is the labor and operation costs of collecting the sample under very very strict medical conditions to avoid infecting the health care workers for very contageous diseases. The reagent cost about $50, the labor and operation costs about $1,000. Now the costs for those in the infected pool just double. It would be better to use that money to buy more reagents.

Hi dux.ie, do you have a source for this claim? $1,000 for labor costs sounds very high--what is the hourly wage of these healthcare workers, and how many hours does it take them to perform a single test? (You also mentioned operation costs--what do those consist of?)

Is the shortage of testing in the US a shortage of labor or a shortage of reagent? Or something else?

Go watch the CSPAN testimony of CDC director Redfield and Fausi. A congresswoman forced Fausi to acknowledge the cost breakdowns. She also forced Redfield to absorb the testing costs as he is entitled to decide himself.

Beside costs. The COVID.19 test is also very sensitive to the concentration of virus in the sample. Different sampling depth can cause drastically different results. Now you want to dilute that further by pooling? Also on average 80% of people are not infected because of their strong immune system, i.e. their white blood cells will gobble up the virus. Now you want to mix up the sample so that those strong white blood cells can clean up the pooled sample??

The operation costs. The total cost for setting up the test site divided by the throughput.

Comments for this post are closed