The wisdom of Daniel Gross

Now in 2017, a bunch of people, each of which now has their own company, the new PayPal Mafia is the Transformer Mafia, wrote this paper called Attention is All You Need, which at the time was mostly ignored by the rest of the world, and they came up with a way to effectively parallelize this training, and enable us to create models that are much larger, and as a byproduct are able to store more context tokens over and over, but effectively more words and effectively be able to predict more words to you.

The paper was mostly ignored when it came out — I thought it was neat, I don’t know that I made much of it. Google at the time had developed this pretty large model based on the paper that it didn’t release for various reasons we can touch on. Then OpenAI really productized that paper with GPT-2 and 3, general purpose transformer, that transformer is from that paper from Attention is All You Need. They were able to build these successively larger and larger models because they were able to parallelize training. These models now, GPT-3, is considered state-of-the-art, although I think our grandchildren will look at that the same way as we look at tube television.

That is from the new Ben Thompson interview with Daniel and Nat Friedman, and yes I do subscribe to Ben and pay for it.

Comments

Comments for this post are closed