A new computer algorithm can play one of the most popular variants of poker essentially perfectly. Its creators say that it is virtually “incapable of losing against any opponent in a fair game”.

…That means that this particular variant of poker, called heads-up limit hold’em (HULHE), can be considered solved. The algorithm is described in a paper in

Science^{1}.The strategy the authors have computed is so close to perfect “as to render pointless further work on this game”, says Eric Jackson, a computer-poker researcher based in Menlo Park, California.

“I think that it will come as a surprise to experts that a game this big has been solved this soon,” Jackson adds.

…Bowling and colleagues designed their algorithm so that it would learn from experience, getting to its champion-level skills required playing more than 1,500 games. At the beginning, it made its decisions randomly, but then it updated itself by attaching a ‘regret’ value to each decision, depending on how poorly it fared.

This procedure, known as counterfactual regret minimization, has been widely adopted in the Annual Computer Poker Competition, which has run since 2006. But Bowling and colleagues have improved it by allowing the algorithm to re-evaluate decisions considered to be poor in earlier training rounds.

The other crucial innovation was the handling of the vast amounts of information that need to be stored to develop and use the strategy, which is of the order of 262 terabytes. This volume of data demands disk storage, which is slow to access. The researchers figured out a data-compression method that reduces the volume to a more manageable 11 terabytes and which adds only 5% to the computation time from the use of disk storage.

“I think the counterfactual regret algorithm is the major advance,” says computer scientist Jonathan Shapiro at the University of Manchester, UK. “But they have done several other very clever things to make this problem computationally feasible.”

The computer does engage in a certain amount of bluffing, full story here, via Vaughan Bell.

From the article: “Bowling and colleagues have looked at one of the most popular forms, called Texas hold’em. With just two players, the game becomes heads-up, and it is a ‘limit’ game when it has fixed bet sizes and a fixed number of raises. ”

So only solved in a two player variant, but I imagine that solving it for any number of players is probably relatively trivial.

Not trivial, it’s exponentially harder to due to the combinatorial explosion.

There are

1,624,350 ways to deal two hands

18,322,66,800 to deal three hands

1,896,396,138,000 to deal four hands

etc

846,980,238,813,394,205,520,000 to deal eight hands

That’s 521 quadrillions time as many ways as there are ways to deal two hands, and that’s not even counting all the combinations of potential bets and raises. The big challenge in solving game is the sheer size of the game space.

Thank you for that info. The combinatorial explosion does seem daunting, but I wonder if their compression algo (mentioned in the article) could trim this to something more manageable.

Agree. And then there’s the social interaction and reciprocal modelling of other players by each player.

My dissertation – so many years ago – involved multi-player game theory and bargaining models. Extending simple models of 2 player bargaining to three players made these models much more complex (and largely unsolvable). While I am sure there have been advances in theory and computational methods since then (more of the latter than the former in this area, I would guess), I would expect modeling multi-player games to be much (much) more complex than modeling 2 player games.

I see you’ve never had to deal with a major raise by the big blind with a communal pot…

Oh wow, that is some key information to leave out of the summary post. You lose almost the whole game once poker is down to just two people.

Interestingly, while heads-up is exponentially easier for a computer, it tends to be exponentially harder for a human – computers have no problem calculating equities but deal poorly with large game trees, whereas humans tend to have good heuristics for pruning large game trees but deal poorly with evaluating hands with middling equities (e.g. we are good at knowing hands which have 90% or 0% equity but have a hard time distinguishing between the 40% and 60% hands).

Note that solving a multi-player version would have to require modeling the situation where two or more players are colluding, either via actual cheating (letting each other know their hole cards via hidden channels) or via coded information in their bets as is done between bridge partners. Worse, it would require modeling the situation where the computer algorithm was colluding with other players. It’s pretty common for games to be solvable for two players and unsolvable for more due to considerations like these.

“I think the counterfactual regret algorithm is the major advance,”

About time someone modeled the Walk of Shame.

TD – Proust did that a hundred years ago, using approximately 50 thousand French words. Which need to be gone over twice, but no more than that.

If they could scale up the program for five-player tables, these researchers could get some serious ROI at various online gambling sites. Assuming that computer-assisted betting is legal at some of these, the fact that their program made them real money would only add to the fame and weight of their theoretical/academic work.

this has been happening, there are Polaris-esque bots out there…pokersites try to be on the lookout for bots and will ban the username if they are found to be bots but they are caught though YMMV

also HULHU was going to be the first type of hold’em solved due to the limited factors involved (2 players, limited decision tree). While this may be the first true GTO bot, Vegas used to have a HULHU video poker game whose AI was pretty much GTO.

Interestingly, that’s not necessarily the case. Playing the Nash equilibrium only guarantees that no one is going to pick your pocket, but it’s perfectly possible that it only gives you a tiny edge over a very weak player. A Nash equilibrium strategy doesn’t do opponent modeling, because doing so could be exploited by a very strong opponent.

I recently solved the game of Goofspiel for 10 cards, optimizing for win probability, not score. It took 8G of memory, happy to push to the 13 card game if someone has a machine with 5TB of ram. The optimal strategy only has a 3% edge over a very dumb player who simply matches the card in his bets.

reading an article about the program states that it is a weak-form nash so not airtight, and yes an exploitative strat would be more profitable for the bot against weaker players but as you said deviating from the equilibrium exposes the bot to be exploited. The Vegas game relied on this principle as at worst (by breaking even) it earned the rake, and (very) often players were out of there depth and making sub-optimal plays.

As to your point you are right that at the high stakes you are less likely to see massive bots as playing against a pro that is close to GTO would only result in the poke-site winning the rake, but with lower-stakes/ bum-hunting whales it should be profitable especially over multiple tables.

Hmmm. Article says authors acknowledge an “extremely small margin by which, in theory, the computer might be beaten by skill rather than chance. But this margin is negligible in practice.”

Is this really proven? Who has the computer beat? What limits are assumed? Poker’s just another game if the limits are low. If you can knock a player out with one bet, it’s different, and I’ll bet on the champion-level human.

As Yogi said “In theory, theory and practice are the same, in practice they’re not”

yeah, this one is all about those qualifications. Heads up play, and with a limit.

Because even great players lose to The Lucky Ones when they go all in at a 95% win chance but get killed on the river.

This is a great example of the kind of technique that is sweeping AI at the moment: don’t try to code a strategy, just code a learning algorithm. It’s highly successful technique, not just in playing games but also in domains like computer translation, for which the most successful algorithms today don’t rely so much on actually parsing the grammatical structure of sentences and simply recognizing lexical patterns. Despite their success, the resulting algorithms are intellectually kind of disappointing; in the end you haven’t so much identified a strategy as assembled an unimaginably lengthy and apparently random set of rules — kind of like an actual implementation of Searle’s Chinese room.

I believe Chess programs are still not part of this trend. While they do use some arbitrary rules, I think their primary strategy is still look-ahead decision trees, i.e. something that looks like a real analysis of the game. I don’t know if this is because the technique just doesn’t work well or because the leading Chess playing programs were all written by a previous generation of AI programmers before this trend swept the profession. I’d be interested to know.

And Arthur B, because this technique doesn’t do any real game analysis, it’s not clear that the combinatorial explosion of possible states that occurs as you add players is really a relevant block. Certainly it isn’t a technical block, although in practice the algorithms might become significantly less good with more players.

don’t try to code a strategy, just code a learning algorithm.As a layman, it strikes me that what is going on in part is that big increases computer power are making (not quite) brute force approaches more practical, so that you don’t have to be quite so clever in designing algorithms.

This will be debunked. The writer clearly has no idea how poker is played and is simply repeating claims he doesn’t understand. There is no perfect poker strategy because consistency is exactly what you have to avoid. Also, Poker being an evening out luck over time game, it’s extremely laborious to test bot strategies. There is no way they have proven this thing is impossible to beat. That would require a huge amount of play with top players.

“There is no perfect poker strategy because consistency is exactly what you have to avoid”.

Yes there is, poker is a game for which a Nash equilibrium is proven to exist. When you say “consistency”, you’re probably thinking that you can’t play in a way which telegraphs your hand too much, but Nash equilibrium strategies can take this in account.

For instance, take a very simplified “toy game”: I can bet 10 chips into a pot of 10 chips, and you can call the bet or fold. Half of my hands say “Winner” and half say “Loser”.

The Nash equilibrium is that I should bet enough to make you indifferent to calling or folding. If I bet all my winners and half of my losers, then you are “paying” 10 chips for a 1/3 chance of winning 30 chips, so the expected value if you call or if you fold is 0. You should call with a probability that will make me indifferent to bluffing. The payoff for my bluff is risking 10 chips to win 10 chips, so if I am called half the time, I am indifferent to bluffing. Therefore, the Nash equilibrium for this game is: I bet all my winners and half of my losers, and you call 50% of the time and fold 50% of the time. Or in other words: there is no way that either player can improve their expectation by deviating from this strategy.

“Also, Poker being an evening out luck over time game, it’s extremely laborious to test bot strategies”

There are a number of techniques to mitigate the effect of luck. One I believe they used for this was “duplicate hands” – have a bot play a human, while another bot plays a different human, dealing the exact same cards but switching the hands the bot and the opponent get dealt. So if Bot A is getting dealt better cards than its opponent, Bot B will be facing an opponent who is being dealt better cards. This reduces a lot of short-term variance.

Is a Nash equilibrium proven to exist for no-limit poker, even heads-up? And in limit games, is Nash equilibrium proven for more than two players? As a layman who sometimes plays nano-stakes and at home games, these questions are very interesting to me. Thanks for your comment!

Yes. John Nash proved that a Nash equilibrium will exist in any game with a finite set of choices. Assuming you do not allow betting fractions of the smallest chip and play with a finite number of chips, you will have a Nash equilibrium.

Thanks for the reply guys. Is it a different equilibrium for different stack sizes? Is the equilibrium strategy “differentiable” in some sense? I.e. in a NLHE game, it the guy to my left has a 5000 chip stack, is the Nash equilibrium strategy guaranteed to be close to the optimal Nash eq. strategy for when my rightmost opponent has 5001 chips?

The equilibrium strategy is not proven to be “differentiable”. In many games, changes in strategy based on different starting situations can be large.

Yep, that’s the result Nash is famous for. Any finite game has at least one mixed-strategy Nash equilibrium.

Poker players don’t care about Nash equilibriums. In theory some Nash equilibrium exists for every single possible hand or partial hand in every possible situation, but that Nash equilibrium is NOT the ideal way to play that hand. The ideal way to play that hand will depend on your opponents’ style of play. The surest way to lose in poker is to follow consistent rules.

Um, they surely do, at least at any moderate level of skill. The poker literature is full of this stuff. I don’t think it appeared in Brunson, but Sklansky was writing about it 30 years ago.

That game is nothing like poker. Games like poker with multiple betting rounds are all about signaling and reading signals. Bots aren’t good liars or readers of liars. That’s why bots still aren’t as good as humans.

That technique for testing doesn’t really save time. Poker can’t be judged by one hand – a bot’s ability can only be judged by playing the same opponent over a long period of time. A good player will learn how to mislead it and it will not learn how to mislead a good player.

Tom, plenty of people who know game theory read this blog, as do the proprietors, and it should tell you something that only you (who clearly do not know game theory) is having this reaction.

By the way, one can play online against the bot. If you are curious I recommend you try it.

They should get a patent.

And, name it after Alex.

If this did what they said I would predict there would be no article about it, but instead they’d be mass using this on the sites that have large heads up holder cash games.

There haven’t really been any large HU LHE games online for several years, because bots have existed which are close enough to GTO for a while now. The advance from 95% GTO to 99.9% GTO is largely academic, the latter is good enough to kill the games.

So, still a long way to doing social science! Apres mois le deluge. 🙂

No.

I don’t think the program can always win in the long term. Expected value, for instance, should be zero if the program is pitted against another version of itself?

Yes, what they mean by solved is that this program can not be beaten, not that it will always win. Playing itself against itself will lead to a long term draw.

You can’t beat a perfectly randomizing rock-paper-scissors bot either. The difference there being you can’t lose. It doesn’t matter what you do, you will draw against the RPS bot. You will lose to the poker bot unless you play its own or an equally good strategy against it, in which case you will draw.

They should go back in time 10 years and sell it to Andrew Beal.

Comments on this entry are closed.