“Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”
The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
In other words, the human now adds absolutely nothing to man-machine chess-playing teams. That’s in addition to the surprising power of this approach in solving problems.
Here is the link, via Trey Kollmer, who writes “Stockfish Dethroned.” Here is coverage from Wired. Via Justin Barclay, here is commentary from the chess world, including some of the (very impressive) games. And it seems to prefer 1.d4 and 1.c4, loves the Queen’s Gambit, rejected the French Defense, never liked the King’s Indian, grew disillusioned with the Ruy Lopez, and surprisingly never fell in love with the Sicilian Defense. By the way the program reinvented most of chess opening theory by playing against itself for less than a day. Having the white pieces matters more than we thought from previous computer vs. computer contests. Here is the best chess commentary I have seen, excerpt:
If Karpov had been a chess engine, he might have been called AlphaZero. There is a relentless positional boa constrictor approach that is simply unheard of. Modern chess engines are focused on activity, and have special safeguards to avoid blocked positions as they have no understanding of them and often find themselves in a dead end before they realize it. AlphaZero has no such prejudices or issues, and seems to thrive on snuffing out the opponent’s play. It is singularly impressive, and what is astonishing is how it is able to also find tactics that the engines seem blind to.
Did you know that the older Stockfish program considered 900 times more positions, but the greater “thinking depth” of the new innovation was decisive nonetheless. I will never forget how stunned I was to learn of this breakthrough.
Finally, I’ve long said that Google’s final fate will be to evolve into a hedge fund.