GPT-3, etc.

Here is an email from a reader, I do not really have an opinion of my own yet.  Please note I will not indent any further:

“I wanted to draw your attention to something. Are you familiar with “AI Dungeon,” text-based RPG “open world” game running on GPT-2 / GPT-3?  Here’s the author’s discussion on medium, or you can play the GPT-2 version for free to get a sense of it directly.

But what I really want to draw your attention to is players who are using custom prompts to open up dialogs with GPT-3 about non-game things.

This result is particularly impressive:  https://www.reddit.com/r/slatestarcodex/comments/hrx2id/a_collection_of_amazing_things_gpt3_has_done/fy7i7im/

( Following that string of posts is a little hampered by reddit’s format; here are the posts in order: part 1,  part 2,  part 3)

If the author is to be believed, they’ve had GPT-3 / “Dragon”:

1. write code
2. act as a pharmacology tutor
3. write poetry
4. translate english, french, chinese (the instruction to “balance the intent of the author with artistic liberty” is particularly interesting)

It’s hard to excerpt, I’d recommend reading the whole thing if you have time.

Here’s another user’s eloquent conversation about the experience of being an AI, using a similar mechanism (screencap images of the convo, part 1 and part 2 ), with a sample prompt if you want to converse with GPT-3 yourself via AI Dungeon.

I am increasingly convinced that Scott Alexander was right that NLP and human language might boostrap a general intelligence. A rough criteria for AGI might be something like (i) pass the Turing test, and (ii) solve general problems; the GPT-3-AI-Dungeon examples above appear to accomplish preliminary versions of both.

GPT was published in June 2018, GPT-2 in February 2019, GPT-3 in May 2020.

As best I can tell GPT -> GPT2 was ~10x increase in parameters over ~8 months, and GPT2 -> GPT3 was ~100x increase of parameters over ~14 months. Any number of naive projections puts a much more powerful release happening over the next ~1-2yrs, and I also know that GPT-3 isn’t necessarily the most powerful NLP AI (perhaps rather the most popularly known.)

When future AI textbooks are written, I could easily imagine them citing 2020 or 2021 as years when preliminary AGI first emerged,. This is very different than my own previous personal forecasts for AGI emerging in something like 20-50 years…

p.s. One of the users above notes that AI Dungeon GPT-3 (“Dragon”) is a subscription service, something like ~$6 a week. MIE.”

Comments

I arrives when we know it is AI because it is much smarter than us. AI has no rhyme nor reason to imitate humans.

Respond

Add Comment

This is one area where Scott is talking way beyond his expertise.

Laypeople will run the same (or similar) prompt dozens or hundreds of times and then personally select the few that seem most interesting to display. If you think this through you will realize that any sample extracted in this way measures the intelligence of the user, not the machine.

Neural nets looked extremely promising for a short amount of time, especially around the successes in Go and Chess, but it has become increasingly clear that they are hitting dead ends. The failures in games like Starcraft and Dota 2 were - in my opinion - the most definitive refutation of deep learning, but the slow grind in self driving cars might be more convincing given the sheer quantify effort being wasted.

Deep learning is effective for a certain type of restricted problem. However in the vast majority of cases, deep learnig simply does not progress beyond a certain point no matter how much compute you throw at the problem.

" The failures in games like Starcraft and Dota 2..."

That is an odd assessment of results from AlphaStar and OpenAI Five in competition.

I think it's fair to call AlphaStar a failure. It played a couple of demonstration matches, and it had some interesting behaviors, but the limit of its play against real pros was a booth at Blizzcon in 2019. The streamer BeastyQT played it more often than anyone, and he won every map. This is a guy who can't really qualify for tournaments. Not much later, Google declared victory and said the project was too expensive to continue. Apparently the thing was too fragile to survive even tiny game patches from Blizzard and had to be retrained from step one after every patch. There wasn't ever a showmatch against a top pro.

> Not much later, Google declared victory and said the project was too expensive to continue. Apparently the thing was too fragile to survive even tiny game patches from Blizzard and had to be retrained from step one after every patch.

Do you have sources for this?

Respond

Add Comment

Respond

Add Comment

So I 100% agree with your core point here, but do you have any links to Starcraft & Dota being "failures"? I haven't followed it closely, but it surprises me because I heard the initial results were positive enough, but I struggle to see how it wouldn't be possible with sufficient resources to get AI to beat humans at either game.

Is it just that they got to a reasonable point, and then they pulled the plug on the projects due to the games being patched, and it wasn't worth keeping the AI updated? I'm not sure that I'd call that a "failure", more like they just had limited goals?

Broadly, I think the distinction between "stuff we think AI can probably do with enough effort" and "stuff where we don't know how hard it will be for AI" seems very important. Like, as far as I know, there are no poker AI that can hang with top professionals in full table game (I know they're pretty good in heads up). But I struggle to see how you could think that with sufficient attention, AI couldn't accomplish that fairly quickly. But AI development is very expensive, so just because they currently can't do something, doesn't mean it's particularly infeasible.

This is just about these narrow games, I fully agree with your broader point. None of the stuff I've seen from GPT3 makes me doubt my priors about the huge roadblocks to applying AI to such general tasks. But usually in conversation I'm drawing the distinction between narrow, well defined games (where I think we can basically *always* teach an AI to beat us at, if we put in developmental effort), versus the broad tasks. If the big AI research groups actually gave their attention properly to Starcraft or Dota and were unable to develop AI that could match up with humans... I'd be honestly quite surprised? But haven't followed the recent results closely.

Respond

Add Comment

Respond

Add Comment

Responding to Ryan: it is misleading to portray your position as dominant among experts in the domain and Alexander's position as one that only laypeople hold. There is no consensus but I know many very good professional AI researchers who would strongly disagree with what you have written there. The series of posts by Gwern about GPT-3 are informative.

'I think you know I’m not going to pay anything. It would be annoying to have my name splashed all over, but I resigned myself to that back in ~2010 when I decided to set up my website; I see Gwern as a pen name now, and not a real pseudonym. I’m glad it managed to last to 2015.'

Respond

Add Comment

Respond

Add Comment

People aren't running 'hundreds of times' because it's too slow. What you're seeing on Twitter or elsewhere are picked out of maybe 5 at most for the hardest problems.

SC2 and DoTA2 were not failures. They demonstrated human-level play on mindbogglingly difficult games: no planning, imperfect information, huge amounts of randomness, a human meta that has developed for decades over millions of serious players, hundreds of units to consider, state-spaces that make Go and chess look like Tic-Tac-Toe and make imitation learning largely useless, highly cyclical strategies which exacerbate catastrophic forgetting, and real-time rather than turn-based. Just being able to control the camera and arbitrary numbers of units in SC2 at human level for a NN is shocking; pointer networks are *not* an obvious architecture, to say the least, and Vinyals invented them mostly for tasks like that.

What they do demonstrate in terms of limits is even more alarming, particularly from a safety perspective, than their actual capabilities: that it's possible to brute force simple model-free DRL up to pro-level while still suffering from bizarre 'delusions' (to borrow the AlphaGo term) where their performance plummets to subhuman levels. With AG, expert iteration turned out to be the elegant and correct architecture that eliminated delusions and created superhuman performance across the board. OA5 and AS ran into the same problem, but OA and DM (not Google; I doubt Google had anything to do with the decision to move on) couldn't crack the puzzle of what is the analogy of expert iteration for model-free learning.

Both their research programmes have moved on to other things (OA, obviously, is now focusing on unsupervised learning), and it remains unknown how to fix OA5/AS. It'll probably involve all of the recent progress in sample-efficient model-free DRL, and making model-based DRL actually work. I have wondered if MuZero would work applied to SC2: it seems that things like AS watching a bunch of Immortals wander by and doing nothing look a lot like failure to plan and open-loop control, and if it was doing any kind of model-based planning, it would have reacted and perhaps retrieved that game.

As for NNs being dead ends... Not even close. iGPT and GPT-3 alone are writing on the wall for those with the eyes to see, rather than old stick-in-the-muds like Brooks or Marcus who haven't produced any real research in decades, and who clutch at reeds like meaningless disclaimers about "may be running into the limits of pretraining" as if that were the death knell. See you all in another 10 years...

Respond

Add Comment

Respond

Add Comment

I agree . I just tried the free GPT2 right now. Here is the conversation
:
> You say "will Trump be reelected"
He looks up and seems happy.
He seems to be floating.
He smiles and says "Correct"
You say "can you elaborate about Trump and his chances ?"
He seems to think about your question for a moment.
He says "It's mathematically impossible"
You say "why would it be mathematically impossible ?"
He seems to think about your question for a moment.
He says "Try and calculate it yourself"
You say "no, explain it to me, you're the AI "
He seems to think about your question for a moment.
He says "If you have any more questions, you should probably go ask your local pixie"

This exchange reminds me of the old ELIZA/DOCTOR programs that were developed decades ago: https://en.wikipedia.org/wiki/ELIZA

I don't see signs that this program has actually broken ground.

I used Eliza as a lab exercise in an intro course I taught years ago. I was a bit surprised to find its only about 100 lines of code.

Eliza had an excellent cameo in David Lodge's novel "Small World," which all academics should read.

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

So basically ELIZA, which I thought was cute 20 years ago.

Respond

Add Comment

GPT-3 is 100x better. GPT-2 performance also depends a lot on what resources are allocated. It's not Eliza though, it searches data for the types of response found out in the wild, if I understand correctly.

I will try it when I have time. I have found with these bots, they bore me after 2 minutes. I can't quite explain why but it's not an intelligent conversation, it's pseudo thoughts.

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

meant as a reply to Ryan

Respond

Add Comment

Impressive, but not what I would call AGI.

Has everyone seen the GPT generated transcript of Conversations with Tyler?

https://twitter.com/ilyasu/status/1284150654538149889

Respond

Add Comment

". ..act as a pharmacology tutor..."

What does it think about hydroxychloroquine? That's the new Turing test.

Effective against malaria, may cause heart problems.

/pharmacology tutor mode off

The jury is still out on HCQ vs Covid. Currently looks good for prophylaxis and for early treatment. https://c19study.com/

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

It would be nice if this post and comments were translated in plain English.

Respond

Add Comment

"Please note I will not indent any further. . ."

Will someone please explain the meaning of this?

TC is indicating it is a human doing the posting, not a bot.

Ummmm. . . ok, I guess. I'll take your word that is what it means. It makes no sense to me. He uses that phrase a lot. I've never seen that anywhere else in my life but here.

I think what Tyler means is that instead of following the normal blog convention of indenting quotes from other writers (not his own writing), he will not be indenting because it is a long excerpt and Tyler is not interjecting his own writing in between or at the end. Basically, it’s just a way to more efficiently use space on the screen for long quotations of others.

Thank you.

Respond

Add Comment

It may also mean that Tyler is not comfortable using the BLOCKQUOTE tag for some reason, which normally takes up minimal screen space. Especially in this case, where the quote seems to end after point 4, without a closing " - HTML can be brutal on those who don't follow its very simple conventions.

He could also change the "by" line to "a reader."

Or slow burn until we find out that this is in fact GPT-3? Seems too unstructured to me, but it could be.

I could see GPT-3 writing this line: "When future AI textbooks are written, I could easily imagine them citing 2020 or 2021 as years when preliminary AGI first emerged,." but not making this error ",."

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Tyler is simply noting that none of this is his opinion. He is not saying that AGI will be here in a year or two; that is the author of the email that was sent him.

Respond

Add Comment

Respond

Add Comment

I tried GPT-2. It is more grammatical than a mid-1980's Eliza. Both seem like random generators that can limit their grammatical errors. The newer program is better at it. For me this program is as much fun as Tyler Cowen would have at a Las Vegas slot machine.

The Turing test is a low hurdle. When someday a machine succeeds in the Turing test we will have demonstrated that there are humans that are low in conscientiousness.

Respond

Add Comment

Here's a better AGI test. It's the Brian test. An AGI is elected mayor of an American city of more than 1 million population.

A high bar. There’s only ten such cities.

A low bar. Have you seen the current occupants?

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Yes, GPT-3 [may] be a game changer. But to get there from here we need to rethink a lot of things. And where that's going (that is, where I think it best should go) is more than I can do in a comment.

Right now, we're doing it wrong, headed in the wrong direction. AGI, a really good one, isn't going to be what we're imagining it to be, e.g. the Star Trek computer.

Think AI as platform, not feature (Andreessen). Obvious implication, the basic computer will be an AI-as-platform. Every human will get their own as an very young child. They're grow with it; it'll grow with them. The child will care for it as with a pet. Hence we have ethical obligations to them. As the child grows, so does the pet – the pet will likely have to migrate to other physical platforms from time to time.

Machine learning was the key breakthrough. Rodney Brooks' Gengis, with its subsumption architecture, was a key development. FWIW Brooks has teamed up with Gary Marcus and they think we need to add some old school symbolic computing into the mix. I think they're right.

Machines, however, have a hard time learning the natural world as humans do. We're born primed to deal with that world with millions of years of evolutionary history behind us. Machines, alas, are a blank slate.

The native environment for computers is, of course, the computational environment. That's where to apply machine learning. Note that writing code is one of GPT-3's skills.

So, the AGI of the future, let's call it GPT-42, will be looking in two directions, toward the world of computers and toward the human world. It will be learning in both, but in different styles and to different ends. In its interaction with other artificial computational entities GPT-42 is in its native milieu. In its interaction with us, well, we'll necessarily be in the driver's seat.

Where are we with respect to the hockey stick growth curve? For the last 3/4 quarters of a century, since the end of WWII, we've been moving horizontally, along a plateau, developing tech. GPT-3 signals that we've reached the toe of the next curve. But to move up the curve, as I've said, we have to rethink the whole shebang.

We're IN the Singularity. Here be dragons.

[Superintelligent computers emerging out of the FOOM is bullshit.]

As an observer of the Brooks school, I agree. Also described here:

OpenAI’s gigantic GPT-3 hints at the limits of language models for AI

"A more fundamental limitation of the general approach described in this paper – scaling up any LM-like model, whether autoregressive or bidirectional – is that it may eventually run into (or could already be running into) the limits of the pretraining objective," write the authors.

There are certainly a lot of really neat things we can do with good and widely available predictive generation systems, and some of them might seem "superhuman." Perhaps GPT-3 will be the one that can look at a plate of food and tell us the nutrition. Or look at us and tell us if we should eat it.

But I'm going to reserve AGI for true symbolic manipulation and understanding of what plate, food, calories, nutrition, and health really are.

Thanks for the link.

You'd think more people would pay attention to Brooks. For one thing, he's created a commercially viable product, Roomba. For another thing, he's really smart.

Respond

Add Comment

Respond

Add Comment

As I understand it, if you ask GPT-3 "what is health?" you'll get back a well structured answer which is close to the median of the billion "what is health?" answers available on the web. That's useful!

But say "what's a leftist?" fed on MR comments? Or even the web as a whole? It's not going to slice through it all like an Oracle and bring you any core meaning. It's just going to give a plausible answer.

Is a plausible answer good enough for driving? Maybe, if we imagine GPT-23 monitoring all the cars and all the human drivers for a year and doing what they do in each situation, on average. But would *that* even be AGI?

But say "what's a leftist?" fed on MR comments?

LOL!

I'm thinking GPT-23 is going to use all the electrical power in California. GPT-42? It'll need a direct hook-up to the sun. But as a buddy of mine remarked, you can keep a human brain running on a bowl of Cheerios.

Respond

Add Comment

BTW, out of curiosity I prompted GPT-2 about "Straussian reading." Here's one response (prompt in bold):

When Tyler Cowen refers to "Straussian reading" he does so in an esoteric way. He doesn't use it to describe the consciously decided product of a person's free will but as a shorthand for "anti-factual reading". Readers are often exposed to a skewed and unbalanced view of reality. The best view of this we have comes from some of the founders of the mathematical science, e.g., Cantor, who developed the mathematical models for Frege-Presleman theorem (that is, Zermelo-Fraenkel set theory) based on the intuition that the universe is not spatially finite and of no interest except to natural science. The best way to realize this intuition is to use "anti-factual reading" –

OK. Rofl. You win.

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Is AI a form of racism?

Is anything not a form of racism, or at least privilege?

Respond

Add Comment

It is hard to program AIs not to Notice. And the list of euphemisms the nice people use is absurdly long.

So yes, AIs tend to be racist.

I am definitely looking forward to what AI discovery will do to certain political beliefs and pieties.

I imagine constant curation to remove hatefact to the point of retardation of the AI.

Respond

Add Comment

Respond

Add Comment

Respond

Add Comment

Ok, time to stop reading this blog for a week at least

George, please consider staying away for a much longer period of time. Maybe take up community policing in a rough part of town.

Respond

Add Comment

Respond

Add Comment

From time to time, I envision myself as having a scientific outlook. With AGI, from my perspective the first questions are 1. How do you define it? and 2. How do you measure it? (both extensively discussed, not claiming originality or insight) So, my question is: how do we know if an AGI exists? What if one or more exist today, and their interests are so vastly different that ours that communication with us is not something they choose to do? Are we sure we're all that interesting? Reminds me of the sci-fi books about aliens living at vastly faster (or slower) speeds than us, how would we recognize them? Seems to me that we need to limit ourselves to discussions on the AGI entities/systems which are constrained to communicate/interact with us, and so we must assume such constraints don't run counter the development of AGI in the first place. 'Training' a child by forcing him/her to watch paint dry for 10-20 years isn't a route producing a robust adult independent thinker.

Respond

Add Comment

Zero evidence in the reddit thread that Dragon did either of the programming feats the poster claimed. Color me extremely skeptical.

Respond

Add Comment

Respond

Add Comment