The great Ken Regan on AlphaZero

A must-read for anyone who has been following this issue, Ken considers how close to God AlphaZero actually came:

We must pause to reflect on how clarifying it is that this single heuristic suffices to master complex games—games that also represent a concrete face of asymptotic complexity insofar as their size n-by-ngeneralizations are polynomial-space hard…

It may be that we can heuristically solve some NP-type problems better by infusing an adversary—to make a PSPACE-type problem that hits back—and running AlphaZero.

That sort of thing.  And don’t neglect the comments.


God exists and He is a chess program. Now if you begin to feel an intense and crushing feeling of religious terror at the concept, don't be alarmed. That indicates only that you are still sane.

AlphaZero does not play dices, He plays chess.

Cowen's chess posts always remind me of War Games, in which WOPR observes "the only winning move is not to play", and expresses a preference for "a nice game of chess." It's no doubt true that nuclear war will bring us closer to God, but I'm not so sure about a game of chess.

Nice reference to movies and chess. I wonder if the movie got the chess board set up correctly. Often they put the dark square of the board to the right rather than the left.

"He's a lover of life, but a player of pawns".

Ian Anderson was on to this decades ago.

The opponent gives a line of symmetry, giving the machine some line of repeatability in moves, allowing the machine to characterize the typical set of moves available.

In watching the game, the machine will trap the opponent in one corner, allowing him a limited sequence of moves for some powerful piece, like the castle or bishop. So th machine can actually model tree of move sequences as typical, l any move outside the tree is a serious loss, on first principle. When a portion of the move tree is so isolate, then the machine can deal with a move generator, which has a norm, tempo, the init of account in chess.

So, we have 'moved space', really, index space. By pruning the opponents tree and expanding your, you get more tempo, you have more choices when interleaving your move with the opponent. Tempo is conserved and win the game, the strategy is to make the opponents generators less expansive than yours.

Well, when the Alphazero news first broke I went to Ken's site but he hadn't yet commented. Finally!

Now I will summarize how AlphaZero works, in a simplified manner, without having read Ken's post, just to see how close my intuition is. There are two types of chess solving algorithms: one type is alpha-beta with exhaustive search of the chess tree to n-ply (moves, typically you search exhaustively about 10 ply or five full moves for white and black; note you cannot exhaustively search the entire chess tree since the number of moves is astronomical) then limited, non-exhaustive search beyond the n-ply, and the other type is Shannon B type forward pruning (limited search of the tree, based on 'promising moves'). Traditionally only the first type was deemed best. The critical feature of alpha-beta however is how you search the tree in a limited, non-exhaustive search beyond the n-ply. The way it was done in the 1970s is to use primitive rules to search 'more promising' moves according to simple criteria like 'search moves involving check', 'search moves where the knight is centered', 'search moves where the rook is behind an isolated pawn' and so on. About 15 years ago a program called "Fruit" used human master analysis to come up with even better criteria for selective search, too difficult to summarize here, like for example 'search moves in the Sicilian where Rc8 captures Nc3 (exchange sacrifice)'. This program blew away the older chess programs. What AlphaZero is doing is a very complicated version of what Fruit did, but, instead of using 100s or 1000s of hard coded, fixed rules generated by human chess masters to do a selective search, it is generating millions, possibly tens of millions, of such rules, using auto-play. AlphaZero generates these rules and further does 'forward pruning' where it ignores non-promising moves, under the following theory: in a winning position, there are many ways to win, but in a losing position usually at best only move (if at all) saves the game with a draw. So AlphaZero plays at every position random moves, sees if the position is 'winning' or 'losing', and stores in a hashtable/dictionary (a programmable ASIC) the winning position. Millions of these winning positions are saved, and in future games, AlphaZero, when confronted with a similar position, knows from experience this is a promising position to search further, or, if the position is 'nearly similar', to try and achieve (that is, make moves to reach the known winning position).

I would say the difference between AlphaZero and a traditional chess program is evolutionary not revolutionary, unless you are a chess programmer. I say this: AlphaZero to a layperson is NOT doing the following: learning the moves of chess from scratch, then, playing random moves until it learns how to play like a grandmaster. That's known as "genetic programming" and that's how AlphaZero was represented to the general public, but it's not really how AlphaZero works. Nevertheless, the experts said that 'forward pruning of the Shannon B type is inferior in chess programming to the canonical alpha-beta algorithm' and AlphaZero proved them wrong, so from a chess programmers point of view, AlphaZero is indeed revolutionary.

Speaking as an amateur programmer who does not work in the field. Now I go to IM Ken Regan's site to see how close to God I got.

Bonus trivia: a good period piece movie on chess programming is Computer Chess from a few years ago.

That is all you have to say? I was looking forward to a longer analysis.

I read the Regan post, and it deals with something a bit different, but the closest comment, and quite good, about what I talk about above is by "gwern" in that thread. Check it out, it's quite good and confirms what I said. Of interest Monte Carlo Tree Search (MCTS) is only used by AlphaZero to come up with the 'millions of expert rules' that I talk about, when AlphaZero is training, but not in the actual play by AlphaZero, which is more of a traditional alpha-beta algorithm (as I implied above).

> Of interest Monte Carlo Tree Search (MCTS) is only used by AlphaZero
> to come up with the ‘millions of expert rules’ that I talk about, when
> AlphaZero is training, but not in the actual play by AlphaZero, which is
> more of a traditional alpha-beta algorithm (as I implied above).

I don't think so. From the paper:

> During training, each MCTS used 800 simulations


> We also analysed the relative performance of AlphaZero’s MCTS
> search compared to the state-of-the-art alpha-beta search engines
> used by Stockfish and Elmo. AlphaZero searches just 80 thousand
> positions per second in chess and 40 thousand in shogi, compared to
> 70 million for Stockfish and 35 million for Elmo.

so like the proverbial turtles, it is MCTS all the way down. The rules may be encoded inside of a NN, but then again, when a beginner learns chess the rules are encoded inside a NN as well, albeit a wet-ware one - the truly impressive thing about the human brain and where it differs from AlphaZero is that it encodes the tree search algorithm as well inside the neural net.

(ps - one of the things that astonishes me about that paper is the number of simulations they used to train it. A measly 800? why not 8000? or 80000? you'd think the NN would be able to encode better rules if it had better visibility to the end-state of the game, but I guess 800 is enough to get this. Also, along the same lines, I'm curious exactly how much adding a tablebase in training would help increase their ending ELO. They didn't want to do that because they were more interested in testing out the algorithm than making the strongest possible opponent, but hell, I'm sure a lot of chess players would be interested in this.

"Millions" ?
"IM Ken Regan"
"Millions" ---- not millions, try a different number.
"IM Ken Regan" ----- "IM" means nothing in this context, IM is just a time management award compared to what Cowen thinks Regan is doing here.
Thanks for the insights, though.
Check out Steve Hsu on qubits and gates today. The perfect chess game is going to be a detail in the corner of a really good oil painting if those guys keep working hard.
Or not.
It depends - did they take the time to clear a corner of the apartment, near the window that looks over 34th Street and its humble streetlights, for the Christmas tree?
Do you know how many cities in this country have a 34th Street? And how many Christmas trees there are in third floor apartment windows, visible to the person who, as Tyutchev liked to say, had no home but could only walk by?

ok so I misunderstood millions - I thought you were referring to millions of positions, which is a gross underestimate.
Gwern was a very frequent poster back in the day on Robin Hanson's blog, maybe Gwen still is. If you read the classic introduction to Slonimsky's biographical dictionary of musicians, you will note that he references someone like Gwern.
The tyutchev poem is 171 in the Jude translation - "Poshly, Gospod', svoyu otradu...".

Qubits, the tyutchev poem numbered 32 in the Jude translation, you remind me of it.

Thanks for reading. Heinrich Heine wrote the original version of that poem. The internet persona is not me: In the Heine poem, someone like Hsu (or some other AI enthusiast) would be the "young man": as for me "I know my redeemer liveth". But I think you knew that. I thought it would be nice to introduce someone with modern interests to a really good and very non-modern Tyutchev poem. Even on these basement classroom hall bulletin boards that these day-after comment threads essentially are, such things happen.
Tyutchev and Fet are the two most translatable Russian poets, in my humble opinion.

“sees if the position is ‘winning’ or ‘losing’, and stores in a hashtable/dictionary (a programmable ASIC) the winning position. Millions of these winning positions are saved,”

Only by the most lose usage of language could this be considered correct! :)


Comments for this post are closed