Heads-up limit hold’em poker is solved

Here is a bit more on the poker story Tyler mentioned yesterday.

An important variant of poker, heads-up limit hold’em (HULHE), has been essentially solved–meaning that a computer can now play the game so well that it wouldn’t lose much and might even win against a theoretically perfect player over a lifetime of play. Solving the game required new algorithms and significant computational power.

HULHE has 3.16 × 1017 possible states the game can reach, making it larger than Connect Four and smaller than checkers. However, because HULHE is an imperfect-information game, many of these states cannot be distinguished by the acting player, as they involve information about unseen past events (i.e., private cards dealt to the opponent). As a result, the game has 3.19 × 1014 decision points where a player is required to make a decision….There are two challenges for established CFR variants to handle games at this scale: memory and computation.

The solution supports some conventional strategies but also contains new insights:

Human players have disagreed about whether it may be desirable to “limp” (i.e., call as the very first action rather than raise) with certain hands. Conventional wisdom is that limping forgoes the opportunity to provoke an immediate fold by the opponent, and so raising is preferred. Our solution emphatically agrees (see the absence of blue in Fig. 4A). The strategy limps just 0.06% of the time and with no hand more than 0.5%. In other situations, the strategy gives insights beyond conventional wisdom, indicating areas where humans might improve. The strategy almost never “caps” (i.e., makes the final allowed raise) in the first round as the dealer, whereas some strong human players cap the betting with a wide range of hands. Even when holding the strongest hand—a pair of aces—the strategy caps the betting less than 0.01% of the time, and the hand most likely to cap is a pair of twos, with probability 0.06%. Perhaps more important, the strategy chooses to play (i.e., not fold) a broader range of hands as the nondealer than most human players (see the relatively small amount of red in Fig. 4B). It is also much more likely to re-raise when holding a low-rank pair (such as threes or fours) (44).

Why is this important? The only previous games of any difficulty that have been solved are perfect information games–games where each player knows everything that has previously happened. Tic-tac-toe and chess are perfect information games because everything that has happened is summarized in the state of the board. In imperfect games there is hidden information, such as in card games where the opposing players cards are hidden.

It’s clear that most games in the real world (and that includes “games” of nuclear strategy, bargaining, and detection and monitoring) are imperfect information games. Even though the sample space for HULHE is very large it’s smaller than these real world strategy games (and smaller than other forms of poker). Nevertheless, it’s clear that people are “solving” the real world games not by working through the sample space but by pruning it. A combination of search and heuristic pruning in the perfect information game of chess has already produced computers that are better than any human player. What the solution to this relatively small and somewhat unimportant imperfect information game indicates is that the computers are soon going to be better than you and I at the “human” capabilities of threat, bluff and deception.


Notwithstanding, I hate to see a game of chicken, with nuclear weapons, between a human and a robot.

Aww, but I like Matthew Broderick.

LOLOLOL to both...

I was told this guy is on-line and can be played for 100 hands.

Any info?

Perhaps so. Humans can still retain an advantage by making the computer think the game is about one thing, but is really about another (see Michael Crichton's Disclosure for an example).

how about poker when there are only 20 possible cards in the deck...hardly anyone seems to play this variant

"Soon" - is that Cowenian ambiguity? If you mean in less than twenty years, I heartily disagree, and would back that up with a bet.

"“Soon” – is that Cowenian ambiguity?"

Since, Alex Tabarrok wrote this piece, I'd say no, it's not Cowenian ambiguity.

This is pretty cool, but it's kind of like solving 5x5 Go, which was done some time ago.

That said, like the article on convolutional neural networks a little while ago, it suggests maybe there's room to fit more knowledge into search.

The University of Alberta group that did this is really good. They've done a lot of interesting things.

Or it's kind of like pretending that solving the two-body problem in physics means that we'll soon solve the three-body problem, as Colby Cosh noted.

Maybe. I think that, in principle, you could apply this method to more realistic pokers. But in practice it's intractable. But I admit to being surprised that even this application is tractable.

It's like how alpha-beta with extensions is enough for chess, so it ought to work for Go. But it doesn't because Go is so much more complex, so you need entirely new methods.

Yes. I agree that it's a surprising and welcome result, but going from head to head to three handed is enormously more complicated. I'm curious as to whether it's a case where the method could be applied in principle (with more powerful computers, etc.) or not, but certainly not yet able to answer it. There are a decent number of problems where a higher dimensionality means that the proof methods don't work at all.

The wiki link is actually useful for those readers that may not realized that solved does not always mean what the vernacular implies: https://en.wikipedia.org/wiki/Solved_game E.g. chess is not solved perfectly, but a computer can still stomp the vast majority of us.

What does it mean that a game of imperfect information, where player deliberately do not play perfectly (i.e., they bluff) is "solved." It apparently does not mean a dominant strategy has been discovered, such that the program wins everytime. Does it mean the program will on average win against the best human opponents? ver how many repetitions? How do you know? Has the program played everyone?

If two versions of this program play each other, which one wins?

"Playing perfectly" in a game like this means a mixed strategy, i.e., sometimes bluffing. While your apparent position that a dominant strategy must be a pure strategy is one that has intuitive appeal, it is definitely wrong. I suggest reading up on game theory; it's quite interesting. Your questions are all natural ones, worthy of a full discussion in an introductory textbook.

Two versions of the poker program playing each other will probably not result in the same outcome each time, and the weaker version may win, depending on the search tree. This happens all the time in computer chess: often a weaker version of the same program will beat the stronger version, even when they have the same search engine, and the weaker version is searching less of the chess tree than the stronger version. It's paradoxical but observed in practice due to a variety of things having to do with search depth and 'knowing too much for your own good', or maybe 'out-bluffing yourself'?

Also, it's a mixed strategy. Look for example at this figure from the paper. There are several cells with "mixtures of colors representing a stochastic decision." Therefore, even identical versions playing each other will, based on random seeds, make different decisions when faced with the same situation.

@John Thacker - thanks, but the chess phenomena has been observed with the same random seed and the two programs searching the chess tree in exactly the same order, with the stronger program searching more of the tree. Nevertheless, the stronger program will lose because it seems a potentially promising variation that turns out to be a fatal blind alley or trap. The 'dumb' chess engine blithely plays a 'simple' move that seemingly falls into the trap but that turns out to be a refutation, and beats its stronger cousin. It's Ecclesiastes 9:11: "race doesn't always go to the swiftest"

Yes, that's true that that happens with chess programs, but that's also another way of stating that chess is not solved. What's different here is that this variant of poker may be solved, but is not a perfect information game (like chess), so mixed strategies happen. It's a different sort of phenomenon.


What it means to be "solved" is that the computer's strategy cannot be exploited.

Imagine that I have the following strategy: every time I have a solid hand, I just call your bets, and every time I have a bad hand I go crazy with bluffing, betting and raising the maximum. Once you figure out that's what I'm doing, you can exploit it. (That is, you can deviate from your strategy in order to do better against mine.) In the example, you can call me when I go crazy, and you can slow down when I call.

If I have solved the game, that means I play in such a way that you can't exploit my strategy. You can't deviate from yours, because no matter what you do, you'll be unable to get a (long-term) advantage.

Right, and people should realize that "solved" perfect play in such a game will not always be optimal against imperfect humans, because it won't necessarily exploit weaknesses in some strategy. An example in the Wikipedia article is a Rock-Paper-Scissors strategy of perfectly randomly picking an option with 1/3rd probability-- this certainly won't do as well against poor, predictable Bart Simpson as Lisa always throwing paper.

In the abstract, it says "weakly solved". I infer that to mean winning in the short term but not the long term.

"Ellie Kesselman January 9, 2015 at 8:25 pm
In the abstract, it says “weakly solved”. I infer that to mean winning in the short term but not the long term."

There should be no difference between the long and short term, its merely a question of sample size - the more iterations of the game that run, the more likely the results will reflect the true ability of the players, in the same way that a fair coin will be heads or tails 50% of the time, but any one iteration cannot reflect that, and in fact we may need to continue tossing a coin for a long time before the reality of the equal probability of either result is manifest.

I presume "weakly solved" is a technical term referring to some limitation in how far the game tree has been mapped - it seems they have said that its not that every single permutation has been mapped, but that they are close enough that there is no need to continue.

Certainly if there was a difference between the long and the short term that could not be explained by the sample size then it would be no sort of solution at all.

... not just the vast majority of us, all of us

Does this mean online poker websites are in big trouble? If someone could set up parallel instances of this and have it play against humans they'd rake it in.

No, because this is for a simple form of poker and extensions to normal poker are extremely challenging.

That said, it's an impressive result.

Poker is small stakes. Think commodities, stocks, derivatives, etc.

Those other things are orders of orders of magnitude more complex and computers are nowhere near 'solving' financial markets. Poker is a game with simple rules, and with 2 players they can be (and apparently now have been) modeled and 'solved'.

Exactly what I thought! I was asked if this solves or even can be used as an adjunct decision-making tool for, say, the game theoretic aspects of merger arbitrage or the to-and-fro in securities issuance. The answer is "no", not at all.

It is interesting and cool, but is not obviously extensible (extendable?) to other things... yet.

As other commenters have suggested, this not a model of real poker as it is played in the overwhelming majority of games. In most casinos, and indeed most friendly games, the game is *no* limit (not "limit"), and there are a large number of players, at least eight. So when extrapolating to real poker one should proceed cautiously.

The biggest money poker games ever played were heads up limit hold'em between billionaire Andrew Beal and a collective of professional poker players called "the corporation", who alternated playing him. There is a book about it. So, yeah, it isn't the most common form of poker but its still an important variant.

Joe Teicher is wrong in an important detail: the game was heads-up no-limit, not limit poker. That is a big difference. The book that describes this is “The Professor, the Banker, and the Suicide King: Inside the Richest Poker Game of All Time” by Michael Craig.

Did you read the book, King of Hearts? It was limit poker.

I don't know too much about the way computers analyze poker situations, but as an expert human player I can assure you that increasing the number of players dealt into a hand simplifies the situation immensely. One-on-one no-limit poker is way way more complicated than nine-player games, which I believe is still the standard casino form.

Can I get a computer to haggle the price of my next car for me?

Maybe I'm missing something, but doesn't this essentially prove that Texas Hold'em poker is a game of skill?
If so, it wouldn't be viewed as gambling from a legal point of view.
(Legal Definition: Risking something of value to receive something of value in a Game of Chance)

No, not really. In fact, I believe one could argue essentially the reverse-- that if a game has been solved, and all players barring suckers should have access to the optimal strategy, then the result is chance. Baccarat (chemin-de-fer) and blackjack, for example, can be considered games of chance partially because the optimal strategy* is so well known.

* - For blackjack the true optimal strategy involves card counting and various important information about shuffling, size of the shoe. But the infinite draw deck basic strategy is well known and even for sale at casinos.

For example, Rock-Paper-Scissors is *solved* according to this standard definition (perfect strategy-- decide your throw each time independently one of the options with 1/3rd chance). Do you believe that it is truly a game of skill?

Checkers is solved. Is checkers a game of skill? Clearly yes. There must be some distinction between knowing what to do and being able to actually do it.

Sure, I agree that there's somewhere to draw the line, even if it's not easily articulated. IBut given that we know, e.g., that all two player games with perfect information are solvable, we essentially never prove nowadays that something *can* be solved. However, once we have a constructive proof of a solution, we weaken, not strengthen it. Checkers with both players having access to a computer with a solution, however, is not much of a game of skill. (Only chance if, say, opening moves are randomized.)

I don't think I agree with the argument that a constructive solution, an optimal strategy, completely reduces a game to a game of chance rather than skill. But I think that's certainly a much stronger and cogent argument than the reverse, the argument that solving a game demonstrates that it is one of skill.

Weaken, not strengthen, the skill component of the game, at least under the assumption that the winning strategy can be disseminated to all players.

And in tournament checkers-- the opening three moves *are* randomized, so therefore one can make a decent (even if not decisive) argument that it is a game of chance; certainly computer-assisted it is.

Oy you naive saps. I want to invite you all over for poker so I can clean you out.

This computer has not "solved" the game of heads-up limit hold em. What the authors actually claim is to have "essentially weakly solved" the game, whatever that means. What they appear to have done is design a bot that randomly varies how it plays each kind of hand according to a pre-programmed calculus of the theoretically ideal proportions at which the various options, from fold to maximum bet, should be played in order to give the opponent no advantage from consistently playing any particular way.

That kind of strategy is powerful to opponents who don't understand what you're doing. But opponents who cotton on can defeat it by using bets to send misleading signals to manipulate the probability of a bet or fold. The reason bots still can't beat good players isn't that they don't know enough about the math of the game, it's that they haven't even begun to fathom human deception.

Put it up against a top player. It will lose.

No, you are completely wrong. You can try manipulating it by trying to using bets but you are not going to succeed. Game theory tells you that there is a strategy that can not be beat on average, no matter how the opponent plays.

In Rock-Paper-Scissors it is easy to see that there is such a strategy. No amount of human ingenuity is going to give you an edge against an opponent that plays each move with probability of 1/3, independently of what has happened before.

In poker it is not obvious how such a strategy looks like, but game theory assures us that it exists. The authors of the article claims to have computed something that is nearly as good as this strategy in a simple but real version of poker.

The most interesting question for economics I think is how close human play would be to perfection. We are talking about a domain with clear rules, large incentives to get it right, chances to learn by engaging in it repeatedly. If play here is far from the equilibrium strategy what chance does game theory hold to predict things in other domains?

Just to verify my understanding: In checkers, "completely solved" means that if I get to make the first move I can always win the game with this strategy, irrespective of what moves the opponent plays? Right?

In HULHE or these imperfect info games I don't understand what "solved" means: Does this new strategy always enable you to win, no matter what cards were dealt & what moves the opponent plays?

If not what does a "solved" imperfect info game mean?

The strategy does not enable you to win every iteration because of the element of chance - if you make a decision in poker that will be good if the next card is anything other than the Ace of Spades, then you have clearly made a good decision, but there is nothing you can do about the fact that sometimes it will be the Ace of Spades that falls next.

There is no variance of this sort in checkers, since none of the pieces are affected by anything other than the input of the players, so there is no "luck", unless you get lucky in the sense that your opponent gets distracted and makes an uncharacteristically poor decision, something which is unlikely to happen if you are playing a computer, although I suppose there could be a power cut just as it is about to deliver the coup de grace.

In reality solving HULHE is about creating a strategy in which, over the long term (millions of hands in all likelihood), we cannot lose, and our opponent cannot hope to win. That is not to say that he won't win the next hand and the one after that, just that if he keeps playing he will eventually lose, or at the very best break even. (See my post about Rock, Paper, Scissors in this thread for more detail.)

Thanks @Robin.

I perfectly agree with your first two paragraphs about how HUHLE is different from, say, chess because of the randomness aspect & the hidden information on top of that.

What I don't get is this: Say you produce a very good HUHLE playing strategy / algorithm that does win very often. Say in the long run it is almost sure to win most of the time (say, 99.99% of games it plays). But is that sufficient to call it "solved"? What if someone then comes up with another better strategy that wins 99.9999% of HUHLE games. Is that more "solved" than the earlier "solved"?

Note that with the non-random, perfect-info games like checkers or chess this problem does not arise: When they claim checkers has been "solved" it means that they can exactly predict if a game will or will not result in a win.

But your Rock-Paper-Scissors explanation does make sense. So essentially they discovered the analog of a "throw exactly randomly each time" strategy? Which because of the complexity of HUHLE over RPS would be much harder to think up than just "play RPS exactly randomly"?

And they you try to play smarter than the drawing-strategy but worst case you can always default to a guaranteed-to-draw strategy if you start losing?

"What the solution to this relatively small and somewhat unimportant imperfect information game indicates is that the computers are soon going to be better than you and I at the “human” capabilities of threat, bluff and deception."

This quote suggests a misunderstanding of how the game has been solved. There is no deception involved. There is "bluffing" but not in order to deceive, merely in order to balance out the amount of good and bad hands we bet with. Attempts to "solve" HULHE make no attempt whatsoever to deceive the opponent. In fact, the solution should be such that we can give our opponent a written summary of our strategy and he will not be able to take advantage of it, the best he can hope to do is break even, by exactly mimicking our strategy.

The easiest way of thinking about it is to consider a more simple game for which there is an easy to understand Game Theory Optimal (GTO) solution - Rock, Paper, Scissors.

If we throw exactly randomly each time, we have an unexploitable strategy - we can write down our strategy, hand it to our opponent, and they will still be able to do nothing about it, because if you think about it, what are they going to do - throw more rocks? more paper? Their only option is to attempt mimic our strategy, and if they succeed then the game will ultimately be a draw (tie), although of course each iteration will have a winner and we may need to play for quite some time to get a sample size large enough to reflect the fact that two players throwing randomly are exactly evenly matched. (Obviously in reality a human is incapable of choosing truly randomly, but a computer can get as close to doing so as makes no difference)

The object is the same in solving limit holdem, instead of throwing an exactly even amount of rock, paper and scissors we want to bet,call and fold in an unexploitable, GTO way with every hand we could possible be dealt from the best to the worst, in every possible situation, ie on every possible flop, turn and river and versus every possible action our opponent could take (ie vs every possible check,bet raise or fold)

The reason this strategy will beat even the best human player is because of the complexity of HULHE (which incidentally is dramatically more simple than most other forms of poker due to the limitations on betting amounts and the fact that there are only two players) - the human will tend towards the poker equivalent of having a mild preference for rock, paper or scissors, ie he will tend to bluff slightly too much, or bluff slightly too little, or call slightly too much,etc, and in this way the computer will very gradually win in the same way the random computer throwing exactly even amounts of rock paper and scissors will eventually punish the human who is incapable of mimicking exact randomness.

Comments for this post are closed