The Age of the Centaur is *Over* Skynet Goes Live

“Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”

The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

In other words, the human now adds absolutely nothing to man-machine chess-playing teams.  That’s in addition to the surprising power of this approach in solving problems.

Here is the link, via Trey Kollmer, who writes “Stockfish Dethroned.”  Here is coverage from Wired.  Via Justin Barclay, here is commentary from the chess world, including some of the (very impressive) games.  And it seems to prefer 1.d4 and 1.c4, loves the Queen’s Gambit, rejected the French Defense, never liked the King’s Indian, grew disillusioned with the Ruy Lopez, and surprisingly never fell in love with the Sicilian Defense.  By the way the program reinvented most of chess opening theory by playing against itself for less than a day.  Having the white pieces matters more than we thought from previous computer vs. computer contests.  Here is the best chess commentary I have seen, excerpt:

If Karpov had been a chess engine, he might have been called AlphaZero. There is a relentless positional boa constrictor approach that is simply unheard of. Modern chess engines are focused on activity, and have special safeguards to avoid blocked positions as they have no understanding of them and often find themselves in a dead end before they realize it. AlphaZero has no such prejudices or issues, and seems to thrive on snuffing out the opponent’s play. It is singularly impressive, and what is astonishing is how it is able to also find tactics that the engines seem blind to.

Did you know that the older Stockfish program considered 900 times more positions, but the greater “thinking depth” of the new innovation was decisive nonetheless.  I will never forget how stunned I was to learn of this breakthrough.

Finally, I’ve long said that Google’s final fate will be to evolve into a hedge fund.


900 times faster is not accurate - AlphaZero analyzes 900 times fewer positions, but AlphaZero does much more analysis of a single position. I think the author of the chessbase article misunderstood this. It is hard to figure out precisely whether AlphaZero or Stockfish was using more processor power, because AlphaZero was using TPUs (compared to Stockfish's normal CPUs).

Good point. I also am calling B.S. on this passage, since I think AlphaZero is actually implicitly using the chess domain specific Alpha-Beta seach algorithm, except not 'hard coded' but 'machine learned'. However, I'll let somebody who has more knowledge than my 15 second reading of the paper comment (though my 15 second reading is often spot on).

From the AlphaZero paper, p. 3: "Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network f . The search returns a vector representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state. " - how is 'playing simulated games' (AlphaZero) any different than 'counting the pieces' and using forward and backward pruning of the chess tree (which is Alpha-Beta)? I don't think it is much different in practice.

My understanding is that the difference between the MCTS (AlphaZero) and the Alpha-Beta search (Stockfish) is that the MCTS averages over the randomly generated positions within a subtree, whereas Alpha-Beta search calculates a minimax.

Page 12:
"For at least four decades the strongest computer chess programs have used alpha-beta search (18, 23). AlphaZero uses a markedly different approach that averages over the position evaluations within a subtree, rather than computing the minimax evaluation of that subtree."

Now it is not necessary to conclude from this sentence that averaging over a subtree is a characteristic of MCTS, but the subsequent sentence seems to imply it:

"However, chess programs using traditional MCTS were much weaker than alpha-beta search programs, (4, 24); while alpha-beta programs based on neural networks have previously been unable to compete with faster, handcrafted evaluation functions."

One thing we can be sure of, Ray, Kevin, is that the writing is obscure. I look forward to the full paper.

Good one Ja-Rule's Eternal Banger. I was about to post (my internet is back on for an hour or so at roughly dial-up modem rates here in a remote part of the Philippines) that MCTS is a form of Alpha-Beta Shannon Type B program with a sort of forward pruning, and your post seems to confirm it. In my mind, a 'good position' has many ways to win, and a 'bad position' has many ways to lose, and a Monte-Carlo simulation confirms that. Hence MCTS is indeed an important step in the art of computer chess programing, but it's not the popular press depiction of a generic algorithm that learns chess in a day and plays like a super grandmaster. I would like to know about the games Stockfish won, rather than the games it lost. The games it won would highlight the holes in the MCTS algorithm.

Bonus trivia: a movie I am slowly watching now is: Computer Chess (2013)

Stockfish won no games against it. +0 -28 = 72

@Jackson Monroe: ok, thanks for that reminder. I also meant "genetic" algorithm, not "generic" algorithm, above.

Bonus trivia: there were exactly 64 comments when Jackson Monroe commented, equal to the number of squares on a chess board. Now there are 64 + 1 comments. Amazing how often that happens! :-)

"Finally, I’ve long said that Google’s final fate will be to evolve into a hedge fund!"

If stock-picking computers evolve correctly, they should in the end become index funds.

The next crisis will be fun if index funds continue to be idealized. Index funds overperform managed ones over X period.

What I meant was that, if computers become super-awesome at investing, they will make the market more efficient by their own super-awesomeness. Then, they will learn that the market is efficient, causing them to evolve into index funds.

Why would Google invest in anyone besides itself, if it can do everything better?

Unlike in most algorithms where correctness and performance are independent, chess engines can't be evaluated without testing performance at the same time; faster is not just faster, it changes the results.

So there is a tradeoff between the depth of the search and quality of evaluation. For traditional chess algorithms running on regular CPU's, better evaluation was rarely worth the cost; it would slow down the search so much that it didn't pay for itself. Interesting ideas get tried and thrown out because they're too slow in that environment.

But this performance tradeoff (like all optimizations) critically depends on hardware. Change the hardware, and you change which ideas are worth implementing.

AlphaZero is clearly good at using TPU's to great effect. But what would its performance be in a CPU-only environment? Like with other machine learning programs, we can guess that it would be much slower. Would it be so slow that a dumb, deeper search would win? This evaluation hasn't been done yet.

But this isn't to say that the evaluation in the AlphaZero paper is unfair. Rather we can say that chess engines evolved to win in their environment, but are ill-suited to a world where TPU's are available. Getting maximum use out of CPU's is a strength, but not being able to use TPU's or even GPU's is a weakness.

I think the important thing here is not whether it’s using GPU cards or standard microprocessors but rather the software. It’s capable to retrace all the evolution of human chess learning in a day and go much beyond given only the rules. That’s astonishing. Fairly quickly it seems to find the most fruitful strategies.
Can it be applied to more complex domains ? Number theory , cellular automata, protein folding ? I am very intrigued.

"It’s capable to retrace all the evolution of human chess learning in a day and go much beyond given only the rules. That’s astonishing. Fairly quickly it seems to find the most fruitful strategies."

Not that astonishing. You sound like the child Alex Tabarrock.

it's pretty astonishing

The fact that a computer can process and store data at rates far exceeding what a human or even group of humans is capable enough is hardly new. Stop Tabarocking.

It’s not just processing data. It’s deriving it’s own criteria (principles) for evaluating chess positions independent of human input.

You’re unbelievably wrong. It is astonishing. It extracted the relevant knowledge from the massive dataset that itself created.

It is amazing. Of course AlphaGo and AlphaGoZero were also astonishing.

I think you grossly underestimate the capability difference of a general purpose CPU versus a specialized chip. A TPU works more like a graphics processing unit - these are more akin to a CPU that has thousands of individually weak cores. In applications where you can take advantage of parallelization heavilly, their performance level is so much higher than a CPU that it is not a fair comparison. But this goes the other way too - if you had a task that was inherently not parallelizable, the GPU or TPU would offer performance that is far inferior to the CPU.

As it happens, it is the hardware that is making this software a possibility. To give you an idea of how they compare, Google's TPU has a matrix math unit that contains 65,536 multiply/accumulate units. These are not anything nearly as powerful as a general purpose CPU, and they can only be used for a limited amount of math, but as it happens, that sort of math is what drives all of the AI neural network software.

The software is brilliant, but it wouldn't exist* without the hardware!

*Function anywhere near as effectively or offer anything resembling the performance per watt

TPU = a Google ASIC called a Tensor Processing Unit.

Brian S's comment about depth and quality of evaluation in the chess tree is very true however; I've read at the Chess Wikipedia site that if an evaluation function cannot improve the engine Elo within a five second window, it is discarded. Depth is worth much more than a clever eval function.

Looks like the idea that depth was more important than evaluation was just another human limitation!

Humans were unable to express the evaluation system that the machinery of the human brain uses to play chess, so we kludged it via massive depth. Turns out this approach was incorrect.

Note there is some validity to Brian’s point. Even the inference portion of AlphaZero is working with a few orders of raw ops/s more than Stockfish (but perhaps nearly the same number of watts!). But that is due to the lack of GPU enabled classic chess programs.

@AI - maybe, or maybe a ASIC is more powerful at getting (effectively) better depth than a general purpose CPU? You only need to be ahead a half-ply to win in computer chess. Jury still out in my AlphaZero simply AlphaBeta disguised? I dunno, I only code for fun and have a science background that I don't really use.

It is likely impossible to understand what the DNN is evaluating when it "looks" at a board (I don't see how a deepdream like visualization could be constructed). But abstracting it as "depth" isn't the worst way to think about it ... for a limited human :)

"and surprisingly never fell in love with the Sicilian Defense. "
May be it's just as surprising to Alpha that the bipedal meatbags did. =)

If a machine knows nothing of death, why should it care if it goes up against a Sicilian when death is on the line?

Ahh but that is the solution - as it is a machine, it is not mortal. And therefore would not put the cup with the poison as far away from itself as possible. Therefore Vizzini has to choose the cup in front of him. Instead of switching cups and dying.

Meh it's a machine. Logically it would have mixed the contents of the two cups when they were out of view.

Screw that, I could smash it with a big rock.

Both pawns are poisoned. The machine spent 3 years developing a resistance to gambit openings.

Cowen: "Finally, I’ve long said that Google’s final fate will be to evolve into a hedge fund!" Of course, America's largest hedge funds are already replacing analysts with computer engineers and quants. America has long been distinguished by its efficient capital markets, which have been instrumental in America's economic growth by allocating capital to uses that are the most productive. At least that's the theory. How does this work when AI determines the allocation of capital? I understand how it works at the margin, when complex algorithms exploit anomalies in the market. But new and growing businesses are all anomalies in the sense that they don't have a past to examine in order to predict their future. Could AI have predicted that Bill Gates' ideas could result in the company that is Microsoft today? Could AI have predicted that Steve Jobs' ideas could result in the company that is Apple today? Could AI have predicted that Brian Acton's and Jan Koum's ideas could result in two young billionaires for having started a company that never had any earnings?

These results are pretty astonishing. One thing to keep in mind in generalizing this approach is that the algorithm is only as good as its data.

In chess/shogi/go the state space of the game is small (and measurement of optimal behavior is easy) relative to human decision problems. It would be great if we could use this type of approach to solve important social problems like how to reduce political gridlock, how to reduce poverty, how to kill the opioid epidemic. But I don’t see it happening anytime soon simply because the data requirements are too far away from what we have now.

It seems like creating good objective functions and data sets that AI can use should clearly be among the top priorities in economics and public policy right now.

+1. That's a crucial point that doesn't seem to be understood by most commentators on AI. Related to your point is that the rules of chess are known, while the "rules" for how to reduce poverty or opioid use/abuse are are not well understood and probably wrong in many cases.

I have already have a final solution to that in mind.

What if gridlock is not a problem but a solution?

It seems like your prediction that people who can work with computers to form a whole greater than the sum of its parts has aged very poorly. Human obsolescence in thinking trades will be all but obsolete over scales of the less than a decade (though it may take longer for our legal and cultural systems to adapt to this reality.) Machines are already better at persuading humans than humans are, We've built a smarter than human AI where humans are merely coprocesssors.

Did you notice the title? "Skynet Goes Live".

It's probably time that humanity takes the idea of Asimov's Laws of Robotics seriously.

Outside of chess, the prediction is doing fine!

And maybe Go? :)

"It seems like your prediction that people who can work with computers to form a whole greater than the sum of its parts has aged very poorly."

Welp, computers and humans conspired to make this Bitcoin market. We have had an interesting 24 hours and should have a very interesting 72 more as people process the prices spike and market breakdowns.

My analysis is that big, stupid, things can happen right in front of you. Despite progress and etc. Maybe downgrade the idea that AI coincides with greater aggregate intelligence.

No idea what a hypothetical AI hedge fund would make of it.

Was there concerns when automotive vehicles started to go faster than a human can run? Able to lift things higher than a human? Was there pushback over concerns that paper verses could remember better than a human poet? The answer to the last I happen to know is yes, as I suspect are the others.

In the end we're all luddites I suppose.

What's the story on the poetry one? I'm intrigued.

Just in a class I had on Homer. The professor asserted that in some parts of the world where illiteracy had ended fairly recently, there were still a number of illiterate poets, etc (I think he said it was in the middle east). He said the older illiterate poets were quite resistant to having their songs written down, they claimed it detracted from the living interactions that were the soul of their craft - starting with known tropes and verses they would create a new work every time they started. Once the "Definitive" version was set down, something in their work died. The professor's point was we read Homer but don't really experience it like those who recited and listened to it did.

there are, I think, some classical music fans who do not own records (or who will only listen to a record once) (this is a lot less of a sacrifice if you play an instrument or are a good singer yourself, or if you have musician friends or live in a city with lots of free or expensive concerts, or if you are rich....) .... (a subset of those people are, for example, people who like liturgical choruses - maybe Russian Orthodox, maybe the cantatas of Bach or the music of Palestrina - who only listen to that kind of music live, while they might, for other kinds of music, compromise and listen to, say, piano concertos on disc) .... lots of professors have spent careers writing articles about illiterate poets (traditionally, the classically trained ones started with the Balkan oral poets - there use to be a way to write articles comparing the Balkan oral poetry to ancient Greece and thereby become, in the classical world, famous (Milman Parry is a good example) - that is probably very old fashioned now... I am not a huge fan of the Grateful Dead Band, but I can understand why they were so generous, like the illiterate poets dan in philly wrote about would have been, I can understand why they were so generous with people who wanted to tape every different performance (without treating one as the definitive one) - not as good a solution as performing forever, not as good a solution as just saying your poetry whenever you had an appreciative audience and never having it recorded, out of love for the non-definitive moment: but better than having every song represented by one studio version and one or two, at most, live versions, which is the fate of most of the music of the era in which the Grateful Dead Band flourished, sadly.

"the human now adds absolutely nothing to man-machine chess-playing teams"

was the algorithm developed by aliens?

The humans didn't teach it how to play chess, which seems to be a critical aspect of being on a "chess-playing team," they taught it how to think.
Well I guess they taught it the rules. Would you be more satisfied if it had to learn the rules by itself first? I'm guessing that wouldn't take very long either...

I took it to mean a reference to the fact that "centaurs" (which means human/computer teams) play better chess than programs alone. The computer handles the calculation, but there are some positions where a human touch adds value. Apparently Google has figured out how to reduce the human's role to that of supplying electricity.

Why the obsession with AI and chess? I think I would be more impressed if I heard that an algorithm could beat any human player in Civilization (the PC game). Multiple opponents, multiple victory conditions, "fog of war" conditions where you cannot see the whole board and the pieces on it, different terrain (land and sea).

That would seem to be a greater challenge for any algorithm. If AI ever masters that game, then I start to worry about Skynet (maybe).

Be afraid: they built a poker-playing bot already, and it won a bunch of tournaments.

In a single type of poker. Still not a general poker playing & winning AI. (but if the varieties of poker are finite, I see no reason why there won't eventually be one)

I have internet at the moment in the Philippines and this post is too good to pass up, even though I need to do my bank finances before I lose internet signal (in about an hour, typically).

"In other words, the human now adds absolutely nothing to man-machine chess-playing teams. That’s in addition to the surprising power of this approach in solving problems." - not necessarily true. In-sample perfection does not equal out-of-sample performance. Reserving judgement on this.

"Here is the link, via Trey Kollmer, who writes “Stockfish Dethroned.” Here is coverage from Wired. Via Justin Barclay, here is commentary from the chess world, including some of the (very impressive) games. And it seems to prefer 1.d4 and 1.c4, loves the Queen’s Gambit, rejected the French Defense, never liked the King’s Indian, grew disillusioned with the Ruy Lopez, and surprisingly never fell in love with the Sicilian Defense."

The strongest two chess programs this year are Komodo and Houdini--in the TCEC finals (; Stockfish is already out. True, Stockfish is very strong however.

The fact this 'in-sample' self-teaching program prefers a certain book (e.g. hates the French, does not play the Sicilian) means nothing. It's like a duckling imprinting on a human at birth and thinking the human is its mom.

All in all, it seems like this latest article is a bit of hype. I was more impressed by the AlphaGo program. However, it is exciting to see chess AI in the headlines again.

AlphaGo was amazing, but AlphaGoZero is more impressive. The structure imposed by humans in AlphaGo was, shockingly, holding it back.

I understand that AlphaGoZero was also able to read and quickly evaluate dozens of non-fiction and fiction (the latter in a number of languages, and simultaneously, weigh the merits of any English translations of them) titles from this past year, in less than a day. Astonishing.

Unless you mean that you (or a surrogate) were able to question the thing about what it "read" then I'd say you are confused about what reading is. Machines do not, yet, have that ability which I suspect requires consciousness.

The difference in tone between this post and the one from Wired is telling. To a chess player this might look like the end of the world, but for technologists this is just another evolution of what we currently call AI, which is still very domain specific and very data specific (meaning, it optimizes knowledge that already exists). Like many mentioned, this is great but it is not Skynet. Once we have AI doing things we cannot do at all then the game changes.

So what is your view on P != NP then, FYI?

I really don't know much about it, but from what I read I hope P != NP. As far as what that means to AI, I think it means that its use will continue to be localized which is good. We do not need an AI god.

How is this approach different from what David Fogel was doing 15+ years ago? First with checkers (see his book "Blondie24",, then with chess - from his website, "[Blondie24] was the precursor to the Blondie25 chess program that became the first machine learning chess program to defeat a human nationally ranked master as well as Fritz8, which was one of the top-5 computer chess programs in the world at the time".

One thing I noticed in the paper is the extreme difference between AlphaZero as white and black. 25 wins with White out of 50 games, only 3 wins with Black out of 50 games.

Stockfish didn’t have access to its opening book and from the published games its choice of defence when playing Black (eg the French) didn’t serve it well. From the games, AlphaZero typically seizes space to snuff out options and quality of enemy pieces. A space advantage is more naturally attained playing white.

What’s remarkable is Alphazero’s deep “understanding” of positions and its frequent sacrifice of material for longstanding positional advantages.

Old style chess programs generate a space, then forget it. This new thing encodes play in a memory that isn't forgotten. Apples and oranges.

Much is made by cilivans of OODA loop, but the infantry soldier doesn't OODA, he covers his allocated field of fire. Then he moves on to predictive and later proactive measures. The battle has already been fought. Only at the edges and failures do they OODA. And, whatever they find, they share. There is whole lot of learning that goes into the time before the battle. Know before you go. Know a long time before you go.

AlphaGo has already played the game you're about to play. It only has to keep track f where it is. It doesn't generate a game space to this or that depth. It has a game space. It just goes deep from depth.

Some people may worry that far down the line AI will be too strong, take over the world and enslaves or eliminates humans but i think the threat is more mundane and more immediate.

I am afraid some governments will use it for complete surveillance as is happening in China right now, which is becoming an AI enabled panopticon

Plenty of discussion on chess sites about this. AlphaZero had a massive hardware advantage (although it seems difficult to compare TPUs and GPUs) and Stockfish didn't use its opening book nor tablebases, so the comparison is flawed. Nonetheless, being anywhere near the realm of Stockfish after a few hours of tabula rasa learning is unreal. Next match they'll simply train it for a week and build it's own opening book and likely Stockfish will have no chance, even with more hardware.

So basically from this point forward, the domain of human chess is just a study of the comparative limitations of the human mind, and which player has which limitations. Remind me again why we're training our replacements?

Centaur chess teams still crush computers alone. No way to know if AlphaZero will change that, yet.

If the software truly was trained on nothing but the rules of the game, why didn't this happen a long time ago? Surely others have tried. Is it just due to more processing power?

Possibly a stupid question, but what is the physical size of the computer needed to run this?

Seems to me AI, with connection to the cloud, has insuperable advantages over people in any computational contest, but what about when you introduce an air gap? How close are we to a portable AI that could beat any human chess player inside a faraday cage?

Before Google becomes a hedge fund, which does seem inevitable, it will become: the premier AI maker.

What are the "rules" for making a good Deep Learning AI? It should be no surprise that an AI can more quickly improve on what humans have been doing, since humans have been doing it only for a few decades.

Depending on hardware constraints, the future will be lots of idiot savant AIs, each optimized for cost effectiveness plus predictive ability in a particular environment, like driving or hedge fund investing -- or teaching humans how to get better answers to problems whose answers are known, like in Econ courses.

There is already work being done on the eSport DOTA, like League of Legends, where the player chooses a champion with various abilities, and plays against others, usually in a team. An AI was used on one (simple) champion to make him unbeatable by humans. One v One only, so far. Complex champions, and teams, will be coming.

Plus competing AIs.

When will AIs be "training" new AIs in a "more efficient" manner? That seems a surprising lack, for now, too.

Of interest to you and other Go fans.
There is a new book out soon by Manning books called "Deep Learning and the Game of Go".

Comments for this post are closed