The case against the import of GPT-3

From an email from Agustin Lebron, noting that I will impose no further indentation:

“One thing that’s worth noting:

The degree of excitement about GPT-3 as a replacement for human workers, or as a path to AGI, is strongly inversely correlated with:

(a) How close the person is to the actual work. If you look at the tweets from Altman, Sutskever and Brockman, they’re pumping the brakes pretty hard on expectations.
(b) How much the person has actually built ML systems.

It’s a towering achievement to be able to train a system this big. But to me it’s clearly a dead-end on the way to AGI:

– The architecture itself is 3 years old: https://arxiv.org/abs/1706.03762. It is not an exaggeration to say that GPT-3’s architecture can be described as “take that 2017 paper and make 3 numbers (width, # layers, # heads) much bigger”. The fact that there hasn’t been any improvement in architecture in 3 years is quite telling.

– In the paper itself, the authors clearly say they’re quite near fundamental limits in being able to train an architecture like this. GPT-3 isn’t a starting point, it’s an end-point.

– If you look at more sober assessments (http://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html, https://minimaxir.com/2020/07/gpt3-expectations/), without the tweet selection bias, it starts to look less impressive.

– Within my fairly heterogeneous circle of ML-expert friends, there’s little disagreement about dead-end-ness.

The most interesting thing about GPT-3 is the attention and press that it’s gotten. I’m still not sure what to make of that, but it’s very notable.

Again, it’s incredibly impressive and also piles of fun, but I’m willing to longbet some decent money that we’re not replacing front-end devs with attention-layer-stacks anytime soon.”

I remain bullish, but it is always worth considering other opinions.