Poker Bot Battles Humans to a Draw

NBC: A poker showdown between professional players and an artificial intelligence program has ended with a slim victory for the humans — so slim, in fact, that the scientists running the show said it’s effectively a tie .The event began two weeks ago, as the four pros — Bjorn Li, Doug Polk, Dong Kim and Jason Les — settled down at Rivers Casino in Pittsburgh to play a total of 80,000 hands of Heads-Up, No-Limit Texas Hold ’em with Claudico, a poker-playing bot made by Carnegie Mellon University computer science researchers.

…No actual money was being bet — the dollar amount was more of a running scoreboard, and at the end the humans were up a total of $732,713 (they will share a $100,000 purse based on their virtual winnings). That sounds like a lot, but over 80,000 hands and $170 million of virtual money being bet, three-quarters of a million bucks is pretty much a rounding error, the experimenters said, and can’t be considered a statistically significant victory.

The computer bluffed and bet against the best poker players the world has ever known and over 80,000 hands the humans were not able to discover an exploitable flaw in the computer’s strategy. Thus, a significant win for the computer. Moreover, the computers will get better at a faster pace than the humans.

In my post on opaque intelligence I said that algorithms were becoming so sophisticated that we humans can’t really understand what they are doing, quipping that “any sufficiently advanced logic is indistinguishable from stupidity.” We see hints of that here:

“There are spots where it plays well and others where I just don’t understand it,” Polk said in a Carnegie Mellon news release….”Betting $19,000 to win a $700 pot just isn’t something that a person would do,” Polk continued.

Polk’s careful wording–he doesn’t say the computer’s strategy was wrong but that it was inhuman and beyond his understanding–is a telling indicator of respect.

Comments

”Betting $19,000 to win a $700 pot just isn’t something that a person would do." If it's a strategy it is a very bad one. First, the risk reward ratio is off the chart. If another player was sandbagging a superior hand, the loss would have been monumental. The other problem is that although the program cannot exhibit physical tells, this would lead other players to recognize its betting pattern; baiting it into that situation then pouncing would be the way to win the game. The TV show, Person of Interest, with its scene where the computer is forced into a game of chicken is an apt analogy.

Can hardly wait for the anecdotes to begin flowing full force of poker players, et al., THINKING that they are thinking like some computer they heard of: or has the flow begun and I'm not paying attention?
(Alternatively, staking $17K for a $700 pot may be a sign that high frequency trading consists of computer traders with merely human operators.)
As the card-playing algorithms improve in quality, the mind stumbles to think of an all-robotic casino in fifty years or so: cocktail waitresses will be robotic and dispense alcohol to losing customers all the more freely, other algorithms will have to be composed so that patrons are permitted to win on occasion, but in a very convincing fashion . . . as if they won a pot all by themselves.
The house always wins, hunh?

Good comment, but it's the *winning* customers who need the alcohol. You need to keep the losers awake as long as possible, I would think. Or does one want to make them desperate?

@boba : '”Betting $19,000 to win a $700 pot just isn’t something that a person would do.” If it’s a strategy it is a very bad one' - I disagree, though I'm not a great poker player. It's of course bluffing, which is essential to win at poker, and I've seen players with a big bank use this strategy to drive out the weaker hands. When a player is way ahead in chips than another, and that player is credible (they don't bluff often), then betting $19000 to win $700 is forcing the weaker hands to either call or fold. This strategy will force the weakly funded players to drop out. I've had experienced players (this was a girl here in the Philippines that's really good) use this strategy vs me and it works.

They were playing "no limit" - and it was my understanding that this means each player has a bottomless supply of chips.

No Limit refers to the fact that any player has the option of betting whatever they desire up to their current chip count. Limit hold-em has capped raises depending on the pot and how far along in the hand you are.

It is very obvious from your post that you are completely clueless about poker, like most other things. Overbetting is not "of course bluffing" and "weak funding" has nothing to do with it since anyone with a small stack is going to be covered and therefor overbetting offers no advantage above the amount that would put them all-in.

Depends over how many hands. If you're playing enough hands to bet $170 million, then you can repeat this hand 10,000 times. At that point, the law of large numbers means the risk-reward ratio is quite attractive.

The law of large numbers has nothing to do with it. No matter how many times you repeat this the risk-reward ratio does not change.

Yes it does. That's because volatility scales with the square root of the number of iterations, while returns scale linearly. For an edge of $700 and a risk of $19,000, the hand has to win with p=.5184. If the hands only repeated a hundred times the aggregate series loses money about 32% of the time. If 10,000 hands are played the aggregate series loses money less than 0.1% of the time.

A similar phenomenon is well known and exploited in quantitative finance. High-frequency strategies may only make money on a slightly more than 50% of their trades, whereas Warren Buffet may make money on 80% of his trades. But if the high frequency trader is making thousands of trades a day, and Buffet's only making a few a year, the former will experience much less financial risk over the a calendar year. The poker bot's optimizing in the same way that stock trading bots do.

A bad strategy repeated often won't "make up for it in volume". But in poker it's mixed-strategies. I should think an optimal strategy against experts would *require* weird bluffs with positive probability. Maybe it would be only 1/1000 probability with that hand--- but the computer could do it.
I would think, actually, that a computer would do relatively well against expert players in poker, because they wouldn't make stupid mistakes. Against worse players, the expert players would do better than the computer would do against them (that is, wipe them out faster). Also, against a worse player, being able to read body language would be useful.

That isn't really true - if your opponent calls with the incorrect frequency of hands that beats your range, you make money from the bet. Positive expectation is all that matters, not risk vs. reward.

If it helps, I've been in the poker industry for 12 years now, and played for a living for three.

Although the winrate of humans is indicative of a close match (9 big blinds every hundred hands) it is still significant at the stakes these players played and the chosen metric by the CMU team is irrelevant (spin doctiring at its finest). It will certainly be interesting to see what happens in the next few years and if they can close the gap. In his speech, Polk explains that the mistakes made by the bot are what humans consider "gifts" so basically isolated incidents from an otherwise solid strategy so I wonder if those gifts are a function of the overall strategy of the bot or can be eliminated by bettering the bot.

Well, it sounds like the end of online poker (where there's no way to tell a competitor isn't using a bot) can't be far off.

"over 80,000 hands and $170 million of virtual money being bet, three-quarters of a million bucks is pretty much a rounding error, the experimenters said, and can’t be considered a statistically significant victory."

This is Lying With Statistics 101. $9.15 per hand at 50/100 blinds of duplicate no-limit hold'em is a VERY statistically significant result. Like I'm far too lazy to calculate the variance of the exact format here, but we're talking p<.01 that this bot was better than the four humans.

You mean that the four humans were better than the bot?

No, i don't think the humans were working as a team if that's what you are implying.

Anyway, as stated in this thread already, the stats clearly show that the best humans are still much better than that particular bot at heads up no limit holdem. If Human A beat Human B for 9 bb/100 over 80,000 hands, nobody would say Human B is competitive with Human A. It borders on a massacre.

Clearly CMU is using the mainstream's misunderstanding of poker and math (as Terrence put it, "lying with statistics") to tout their bot's skill. Maybe this bot could crush a bad poker player and compete with an average one, but it is still leaps and bounds behind elites like Doug Polk et al. This is like claiming that a chess bot that won 200 out 1000 chess matches vs Magnus Carlsen battled the world's best to a draw. Absurd.

First of all, the machine actually beat Jason Les, one of the four best poker players in the world. This was over many hands. I think it's not unfair to say that in terms of sheer strength, the it's "up there" among the top four. Nobody said it was the best, but "up there among the four best players in the world" is impressive enough for a first effort, don't you think? If you think that an average poker player - maybe you - could beat Jason Les in a high-stakes no limit game, I'm sure that a Jason would be happy to let you test your hypothesis.

Second, regarding your comparison to winning only 200 out of 1000 matches: In what sense do you think the poker situation is analogous to this? If you look at the actual numbers, Claudico won pretty much exactly 500 our of 1000 hands, or within the statistical margin of error. Of course, that's not what counts in poker - it's about money won. But of course you need to understand the net differences as a ratio of the money wagered. That's how ratios work. It's not lying with statistics. It's mathematical literacy.

According to the anecdotes of these players:
http://forumserver.twoplustwo.com/58/heads-up-nl/hunl-whats-your-standard-deviation-according-hem-1153499/

The standard deviation of heads-up NLHE is somewhere around 100-200 bb per hundred hands. The humans beat Claudico by a combined 7320 bb over 80000 hands. As Leo says, it was a massacre. Using the sum wagered is an obfuscating because it's not a useful metric of performance in poker. It would be somewhat like saying one NBA team is only a little better than another in a best-of-7 if it swept the other and won every game by 25 points, "but if you consider how many possessions were in those four games, it was statistically insignificant".

"If you think that an average poker player – maybe you – could beat Jason Les in a high-stakes no limit game, I’m sure that a Jason would be happy to let you test your hypothesis."

I do not believe that I am a favourite over Jason in a HUNLHE freezeout. However I would bet -- quite large -- that I (or most pros) would lose less than 7320bb over 80000 hands against this collection of players.

NLHE bots will be level with humans reasonably soon. But this bot, against these humans? Nope.

David, the fact that you would say something like this betrays that you are completely ignorant of poker statistics and of measuring poker success, since this is a super-basic issue that has been solved long long ago.

I don't disagree with these theses in general, but I think that Mr. Polk's comments might be slightly out of context or slightly misconstrued. Overbetting is in fact well-studied: section 14.1 of _Mathematics of Poker_ discusses a toy example under which the optimal bet size is all-in because the value of the situation to the betting player increases monotonically with the bet size chosen for both value-bets and bluffs. It's a toy example, but it corresponds pretty well to plenty of real-life situations.

Moreover, you can search YouTube for "Tom Dwan overbet" will show that large overbets are in fact a tool in the arsenal of at least some top players.

Again, not disagreeing with the overall view, just the example. Large overbets are something that real players do, and they know why they're doing it, and their reasons for doing it probably roughly correspond to the computer's.

@Nate: When we interviewed Doug last week, the 19k into $700 thing was discussed, and it sounded to me like he was actually complimenting the bot on this play, basically noting that most (non all-star) humans are loathe to lay 27:1 on a bet, but that the bot has no qualms about it if it feels it is correct.

http://pokercast.twoplustwo.com/pokercast.php?pokercast=361 (Interview with Doug starts at about 51 minutes.)

Does the bot FEEL it is correct, does the bot THINK it is correct, or does the bot have no self-reflective capability whatsoever?

Polk’s careful wording–he doesn’t say the computer’s strategy was wrong but that it was inhuman and beyond his understanding–is a telling indicator of respect.

Mood affiliation FTW?

Poker is like economics. It's not really that complicated. :-)

It's not clear what the actual payoff structure for the humans was, or whether all four came out ahead.

I'm guessing the $100K was paid proportionately to how much of the $732K they each won. I haven't thought about how this might influence their strategy.

It sounds like the algorithm was the only loser among the five. If so, I'd say the algorithm suffered a big defeat, and calling the result a virtual tie is inaccurate.

It really should be pointed out that the bot is playing head to head against one human, not in a more general game with many humans / bots.

Moreover, I'm no computer scientist, but I would be very surprised if extending the algorithm to a more general, N opponent game is computationally trivial; curse of dimensionality.

To add: I haven't seen any discussion about human learning. It seems obvious, but the heuristics that human game players use and succeed with developed by playing against other human players. With exposure to bot players, they'll modify those decision rules.

And yet computers still can't play chess.

The game they actually are bad at is bridge.

They're not playing chess, so I don't know that they're "bad" or "good" at it. They sure do calculate things if given evaluation functions by humans, though!

I challenge thee to a duel.

When we say that robots beat human chess champions, or played poker at a similar level to the world's best humans, or did this or that, we really mean that a team of humans was able to do that, a team of humans applying our collective computational theoretical knowledge developed over centuries and allowed more tools than the human competitors. (For example, the human chess and poker players may not have been allowed to use supplementary computational devices while the human robot developers were able to employ whatever amount of memory and CPU power was necessary.) From that perspective, should it really surprise us that a team of humans with far more resources is able to defeat a single human with relatively few resources?

Part of this perspective is understanding that so-called physical capital, like robots, is a fiction. What we call physical capital is really just a physical manifestation of an intermediate step in the deployment of human capital and land capital (natural resources). What's happening here is that when one throws more human and land capital at a problem, like playing chess or poker, one can often do better than when one uses less capital. Why should that be surprising?

That is over a 0.4% human edge. So like the house edge in a standard blackjack game.

I also say it was basically a tie when I leave the casino.

As has been mentioned a few times, I was also surprised by this article. Within the poker community this was seen as a huge win for the human players. The shoddiness of the statistics showing a 'draw' should really be emphasized in this post. It's a pity that this view is the one being spread within the finance/economics online community.

Agreed with everyone else saying this was a sham article and write up. The poker players won by a resounding margin. Anyone that can win a 9.5BB/100bb win rate is a top notch pro in HU poker. Such a shame.

How are they the best poker players the world has ever known? Never heard of any of them.

Without real money at stake, they are not playing poker. Theory and practice diverge with money at risk. This should actually be one of the computer's main advantages.

Comments for this post are closed