Adaptive play in Texas Hold'em Poker

We present a Texas Hold'em poker player for limit headsup games. Our bot is designed to adapt automatically to the strategy of the opponent and is not based on Nash equilibrium computation. The main idea is to design a bot that builds beliefs on his opponent's hand. A forest of game trees is generated according to those beliefs and the solutions of the trees are combined to make the best decision. The beliefs are updated during the game according to several methods, each of which corresponding to a basic strategy. We then use an exploration-exploitation bandit algorithm, namely the UCB (Upper Confidence Bound), to select a strategy to follow. This results in a global play that takes into account the opponent's strategy, and which turns out to be rather unpredictable. Indeed, if a given strategy is exploited by an opponent, the UCB algorithm will detect it using change point detection, and will choose another one. The initial resulting program, called Brennus, participated to the AAAI'07 Computer Poker Competition in both online and equilibrium competition and ranked eight out of seventeen competitors.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[3]  Dione. Brunson Super/System A Course in Power Poker , 1994 .

[4]  J. Schaeffer,et al.  Solving the Game of Checkers , 1996 .

[5]  Jonathan Schaeffer,et al.  Using Probabilistic Knowledge and Simulation to Play Poker , 1999, AAAI/IAAI.

[6]  Darse Billings,et al.  Thoughts on RoShamBo , 2000, J. Int. Comput. Games Assoc..

[7]  Darse Billings,et al.  The First International RoShamBo Programming Competition , 2000, J. Int. Comput. Games Assoc..

[8]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[9]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[10]  Jonathan Schaeffer,et al.  Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Jonathan Schaeffer,et al.  Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games , 2004, Computers and Games.

[13]  Tuomas Sandholm,et al.  Finding equilibria in large sequential games of imperfect information , 2006, EC '06.

[14]  Csaba Szepesvari,et al.  Use of variance estimation in the multi-armed bandit problem , 2006 .

[15]  Tuomas Sandholm,et al.  Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[16]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.