Category: Science

The case against the import of GPT-3

From an email from Agustin Lebron, noting that I will impose no further indentation:

“One thing that’s worth noting:

The degree of excitement about GPT-3 as a replacement for human workers, or as a path to AGI, is strongly inversely correlated with:

(a) How close the person is to the actual work. If you look at the tweets from Altman, Sutskever and Brockman, they’re pumping the brakes pretty hard on expectations.
(b) How much the person has actually built ML systems.

It’s a towering achievement to be able to train a system this big. But to me it’s clearly a dead-end on the way to AGI:

– The architecture itself is 3 years old: https://arxiv.org/abs/1706.03762. It is not an exaggeration to say that GPT-3’s architecture can be described as “take that 2017 paper and make 3 numbers (width, # layers, # heads) much bigger”. The fact that there hasn’t been any improvement in architecture in 3 years is quite telling.

– In the paper itself, the authors clearly say they’re quite near fundamental limits in being able to train an architecture like this. GPT-3 isn’t a starting point, it’s an end-point.

– If you look at more sober assessments (http://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.htmlhttps://minimaxir.com/2020/07/gpt3-expectations/), without the tweet selection bias, it starts to look less impressive.

– Within my fairly heterogeneous circle of ML-expert friends, there’s little disagreement about dead-end-ness.

The most interesting thing about GPT-3 is the attention and press that it’s gotten. I’m still not sure what to make of that, but it’s very notable.

Again, it’s incredibly impressive and also piles of fun, but I’m willing to longbet some decent money that we’re not replacing front-end devs with attention-layer-stacks anytime soon.”

I remain bullish, but it is always worth considering other opinions.

Which country has had the best response to the coronavirus?

I pick the United Kingdom, even though their public health response has been generally poor.  Why? Their researchers have discovered the single-best mortality-reducing treatment, namely dexamethasone (the cheap steroid), and the Oxford vaccine is arguably the furthest along.  In a world where ideas are global public goods, research matters more than the quality of your testing regime!

And the very recent results on interferon beta — still unconfirmed I should add — come from…the UK.

At the very least, the UK is a clear first in per capita terms.  Here are the closing two paragraphs:

It is fine and even correct to lecture the British (and the Americans) for their poorly conceived messaging and public health measures. But it is interesting how few people lecture the Australians or the South Koreans for not having a better biomedical research establishment. It is yet another sign of how societies tend to undervalue innovation — which makes the U.K.’s contribution all the more important.

Critics of Brexit like to say that it will leave the U.K. as a small country of minor import. Maybe so. In the meantime, the Brits are on track to save the world.

Here is my full Bloomberg column on that topic.  And if you wish to go a wee bit Straussian on this one, isn’t it better if the poor performers on public health measures — if there are going to be some — are (sometimes) the countries with the best and most dynamic biomedical establishments?  Otherwise all the panic and resultant scurry amounts to nothing.  When Mexico has a poor public health response to Covid-19, the world doesn’t get that much back in return.  In this regard, I suspect that biomedical innovation in the United States is more sensitive to internal poor performance on Covid-19 than is the case for Oxford.

Another attempt to address the Fermi paradox — aestivation

According to a research paper accepted for publication in the Journal of the British Interplanetary Society, extraterrestrials are sleeping while they wait. In the paper, authors from Oxford’s Future of Humanity Institute and the Astronomical Observatory of Belgrade Anders Sandberg, Stuart Armstrong, and Milan Cirkovic argue that the universe is too hot right now for advanced, digital civilizations to make the most efficient use of their resources. The solution: Sleep and wait for the universe to cool down, a process known as aestivating (like hibernation but sleeping until it’s colder).

And:

The universe appears to be cooling down on its own. Over the next trillions of years, as it continues to expand and the formation of new stars slows, the background radiation will reduce to practically zero. Under those conditions, Sandberg and Cirkovic explain, this kind of artificial life would get “tremendously more done.” Tremendous isn’t an understatement, either. The researchers calculate that by employing such a strategy, they could achieve up to 1030 times more than if done today. That’s a 1 with 30 zeroes after it.

Here is the full article, via the excellent Samir Varma.

*False Alarm*, the new book by Bjorn Lomborg

The subtitle is How Climate Change Panic Costs Us Trillions, Hurts the Poor, and Fails to Fix the Planet.

I agree with the author’s claim that climate change is not an existential risk for humanity.  Still, both the title and subtitle bother me.  The alarm does not seem to be a false one, even if many of the worriers make grossly overstated claims about the end of the earth.  And right now “climate change panic” is not costing us “trillions,” rather virtually all countries are failing to reduce their carbon emissions and most are not even trying very hard.

There should be more of a focus on the insurance value of avoiding the worst plausible scenarios, which are still quite bad.  There is no argument in this book which overturns the Weitzman-like calculations that preventive measures are desirable.

I can report that the author endorses a carbon tax, more investment in innovation, and greater adaptation, with geoengineering as a back-up plan, more or less the correct stance in my view.

There is much in this book of value, and the criticisms of the exaggerated worriers are mostly correct.  Still, the oppositional framing of the material doesn’t seem appropriate these days, and Lomborg will have to choose whether he wishes to be “leader of the opposition,” or “provider of the best possible message.”  Or has he already chosen?

On some limitations of personality psychology

…Big Five Conscientiousness was not found to correlate with mask wearing in a sample of thousands in Spain during the coronavirus epidemic (Barceló & Sheen, 2020). This was not treated by the authors as any kind of falsification of the Big Five, or even evidence against it. The abstract noun “conscientiousness” has a rich meaning, only part of which is captured by the Big Five, and only a tinier part of which is captured by the two-question methodology used here (“does a thorough job” and “tends to be lazy”). But Conscientiousness is often correlated to health behaviors, and is often said to predict them with various strengths, even though the questions in the survey focus on job performance and tidiness.

Here is the full essayby a literal banana,” interesting throughout.

What should I ask Nicholas Bloom?

I will be doing a Conversation with him, so what should I ask?  Here is part of his official bio:

Nicholas (Nick) Bloom is the William Eberle Professor of Economics at Stanford University, a Senior Fellow of SIEPR, and the Co-Director of the Productivity, Innovation and Entrepreneurship program at the National Bureau of Economic Research. His research focuses on management practices and uncertainty. He previously worked at the UK Treasury and McKinsey & Company.

Is there anyone whose name is on more important/interesting papers over the last ten years?  Here is a sampling.

So what should I ask him?

Cornell understands the equilibrium

They are reopening campus for the coming semester and here is one reason why:

…the finding from Cornell researchers that holding the semester online potentially could result in more infections and more hospitalizations among students and staff members than holding the semester in person would.

study by Cornell researchers concluded that with nominal parameters, an in-person semester would result in 3.6 percent of the campus population (1,254 people) becoming infected, and 0.047 percent (16 people) requiring hospitalization. An online semester, they concluded, would result in about 7,200 infections and more than 60 hospitalizations.

Do note it is critical to the argument that the returning students actually are tested on a regular basis, which of course is very hard to enforce on-line.

SARS-CoV-2 T-cell epitopes define heterologous and COVID-19-induced T-cell recognition

The SARS-CoV-2 pandemic calls for the rapid development of diagnostic, preventive, and therapeutic approaches. CD4+ and CD8+ T cell-mediated immunity is central for control of and protection from viral infections[1-3]. A prerequisite to characterize T-cell immunity, but also for the development of vaccines and immunotherapies, is the identification of the exact viral T-cell epitopes presented on human leukocyte antigens (HLA)[2-8]. This is the first work identifying and characterizing SARS-CoV-2-specific and cross-reactive HLA class I and HLA-DR T-cell epitopes in SARS-CoV-2 convalescents (n = 180) as well as unexposed individuals (n = 185) and confirming their relevance for immunity and COVID-19 disease course. SARS-CoV-2-specific T-cell epitopes enabled detection of post-infectious T-cell immunity, even in seronegative convalescents. Cross-reactive SARS-CoV-2 T-cell epitopes revealed preexisting T-cell responses in 81% of unexposed individuals, and validation of similarity to common cold human coronaviruses provided a functional basis for postulated heterologous immunity[9] in SARS-CoV-2 infection[10,11]. Intensity of T-cell responses and recognition rate of T-cell epitopes was significantly higher in the convalescent donors compared to unexposed individuals, suggesting that not only expansion, but also diversity spread of SARS-CoV-2 T-cell responses occur upon active infection. Whereas anti-SARS-CoV-2 antibody levels were associated with severity of symptoms in our SARS-CoV-2 donors, intensity of T-cell responses did not negatively affect COVID-19 severity. Rather, diversity of SARS-CoV-2 T-cell responses was increased in case of mild symptoms of COVID-19, providing evidence that development of immunity requires recognition of multiple SARS-CoV-2 epitopes. Together, the specific and cross-reactive SARS-CoV-2 T-cell epitopes identified in this work enable the identification of heterologous and post-infectious T-cell immunity and facilitate the development of diagnostic, preventive, and therapeutic measures for COVID-19.

Here is the full piece, by Annika Nelde, et.al., via Jackson Stone.  Or from the paper, here is a simpler bit:

At present, determination of immunity to SARS-CoV-2 relies on the detection of SARS-CoV-2 antibody responses. However, despite the high sensitivity reported for several assays there  is still a substantial percentage of patients with negative or borderline antibody responses and thus unclear immunity status after SARS-CoV-2 infection34. Our SARS-CoV-2-specific T- cell epitopes, which are not recognized by T cells of unexposed donors, allowed for detection of specific T-cell responses even in donors without antibody responses, thereby providing evidence for T-cell immunity upon infection.

Big (and good news) if true.

Why and how does DARPA work?

Program Managers

At the end of the day the ARPA Model depends on badass program managers. Why is this the case? PMs need to think for themselves and go up and down the ladder of abstraction in an unstructured environment. On top of that they need to be effective communicators and coordinators because so much of their jobs is building networks. There’s a pattern that the abstract qualities that make “great talent” in different high-variance industries boils down to the ability to successfully make things happen under a lot of uncertainty. Given that pattern, the people who would make good DARPA PMs would also make good hedge fund analysts, first employees at startups, etc. so digging into people’s motivations for becoming a PM is important. More precise details about what makes a PM good prevent you from going after the exact same people as every other high-variance industry. When ‘talent’ isn’t code for ‘specialized training’ it means the role or industry has not been systematized. Therefore, despite all the talk here and elsewhere about ‘the ARPA Model’ we must keep in mind that we may be attributing more structure to the process than actually exists.

DARPA program managers pull control and risk away from both researchers and directors. PMs pull control away from directors by having only one official checkpoint before launching programs and pull control away from performers through their ability to move money around quickly. PMs design programs to be high-risk aggregations of lower-risk projects. Only 5–10 out of every 100 programs successfully produce transformative research, while only 10% of projects are terminated early. Shifting the risk from the performers to the program managers enables DARPA to tackle systemic problems where other models cannot.

That is one excerpt from a new and excellent essay by Benjamin Reinhardt, one of the best pieces of this year, via Patrick Collison.

Note also that DARPA underpays staff, does not hire individuals with a significant web presence, deliberately stays small, and makes it easy to reallocate funds on the fly.  The program managers do not work there for any longer than four or five years, by design.

Researchers speaking on the scientific process

When looking at success indicators, we found that indicators related to openness, transparency, quality, and innovation were perceived as highly important in advancing science, but as relatively overlooked in career advancement. Conversely, indicators which denoted of prestige and competition were generally rated as important to career advancement, but irrelevant or even detrimental in advancing science. Open comments from respondents further revealed that, although indicators which indicate openness, transparency, and quality (e.g., publishing open access, publishing negative findings, sharing data, etc.) should ultimately be valued more in research assessments, the resources and support currently in place were insufficient to allow researchers to endorse such practices. In other words, current research assessments are inadequate and ignore practices which are essential in contributing to the advancement of science. Yet, before we change the way in which researchers are being assessed, supporting infrastructures must be put in place to ensure that researchers are able to commit to the activities that may benefit the advancement of science.

That is from a recent paper by Noémie Aubert Bonn and Wim Pinxten, via Michelle Dawson.

What’s the smart way to use spare Covid testing capacity?

I have a question for you and/or your MR readers: what’s the smart way to use spare Covid testing capacity?

With the virus (currently) receding in many places fewer and fewer people are getting symptoms and seeking tests.

Even without a second wave in the next few months, we’ll need testing capacity again for the next flu season, when we’ll need to distinguish between flu patients and Covid patients.

How should we use spare testing capacity in the meantime? Increase random testing? Weekly tests for everyone in a single city? Weekly tests for everyone in particular economic sectors?

I would be grateful for your thoughts on this.

That is from O.L.  My intuition (and I stress this is not a scientific answer in any way) is to test people who take elevators every day, to get a better sense of how risky elevators are.  And then test systematically in other situations and professions to learn more about transmission mechanisms, for instance the subway when relevant, supermarket clerks, and so on.  Test to generate better risk data.  What do you all think?

China-U.S. fact of the day

Some 54 scientists have resigned or been fired as a result of an ongoing investigation by the National Institutes of Health into the failure of NIH grantees to disclose financial ties to foreign governments. In 93% of those cases, the hidden funding came from a Chinese institution.

The new numbers come from Michael Lauer, NIH’s head of extramural research. Lauer had previously provided some information on the scope of NIH’s investigation, which had targeted 189 scientists at 87 institutions. But his presentation today to a senior advisory panel offered by far the most detailed breakout of an effort NIH launched in August 2018 that has roiled the U.S. biomedical community, and resulted in criminal charges against some prominent researchers, including Charles Lieber, chair of Harvard University’s department of chemistry and chemical biology.

“It’s not what we had hoped, and it’s not a fun task,” NIH Director Francis Collins said in characterizing the ongoing investigation. He called the data “sobering.”

Here is the full story, and there are further points of interest at the link.

Should departments own and control journals?

There is some discussion on Twitter of this matter, and overall I say yes, I would like to see more of this at the margin.  In economics, the two best-known department-owned journals are the Journal of Political Economy (Chicago) and Quarterly Journal of Economics (Harvard).  They also have longstanding histories of being “a bit different,” the JPE having had a Chicago school orientation, and the QJE publishing lots of Harvard grad students and graduates, and being more willing to accept papers with “behavioral” results, and perhaps with more speculative empirics as well.  In both cases, I should add those different orientations are much diminished compared to say the 1990s, the JPE in particular these days not seeming especially “Chicago school” to me, and I wonder if a Chicago school still exists amongst younger economists.

I am very glad we have had these two journals standing out as different in orientation, and I strongly believe that has encouraged innovation, even if (and in fact because) the AER would not have accepted all of those papers.  A lot of “shaky” behavioral results, for instance, have in fact turned out to be quite relevant or at the very least interesting and worthy of further investigation.

One risk is that the different general interest journals become too much alike, too subject to the same pressures, and too homogenized.  And the actual “monopoly” danger, to the extent there is one, is that the American Economic Association controls too many top journals.

To be clear, I don’t see anything sinister afoot with all the AEA journals, but here is a simple way to express my worry.  If I had to, standing on one foot, recite all of the names of those journals and their missions or areas, I don’t think I could do it without multiple mistakes.  (And frankly not so many people in the entire world devote so much attention to following published economic articles as I do, noting that Larry Katz may be #1.)  Somehow the identities are too blurred together, and I wish someone else were running one or two of them.

I am hardly “anti-big business,” but I view commercial publishers as the worst alternative for journal ownership and control.  In addition to all of the usual complaints, I think the commercial publishers often (not always) care less about the quality of the editor, as the emphasis is on how well the sales force can market the journal to libraries.

So unless you want the AEA to run everything, and I certainly do not, that is going to mean more department-owned journals.  I am impressed by those departments that have the money and the commitment to see these journals through — it is not easy.

As of late, there has been a squabble on Twitter about removing one particular journal editor for his injudicious tweets on recent public events (I don’t wish to link to this and add fuel to the fire).  Everyone is entitled to his or her opinion about this particular editorship, but I will say this: Twitter is not the right forum for such a debate.  I am very pro-Twitter, as I have written numerous times in the past, but it does have some of the biases of virality, including peer pressure, and it is not always good for reproducing context or considering objections and revisions to viewpoints.  Instead, start by writing out your opinion, and considering objections, in a long, judicious, thoughtful piece.  Spend at least a few days on the piece, have three of your more critical friends “referee” it in advance of on line publication, and let it be debated for weeks.  Is “too much trouble” really a good reason not to do that?  If you think that who controls the rigorous refereeing process at a top journal is so important, the method for making judgments here is no less important.  “The refereed journals aren’t good, fair, and rigorous enough for me, so we need to slug it out and rush to judgment on…Twitter” just doesn’t make any sense.  We can do better.

Addendum: Paul Novosad has some useful suggestions for encouraging decentralization.