Dynamic Adaptation and Opponent Exploitation in Computer Poker

As a classic example of imperfect information games, HeadsUp No-limit Texas Holdem (HUNL), has been studied extensively in recent years. While state-of-the-art approaches based on Nash equilibrium have been successful, they lack the ability to model and exploit opponents effectively. This paper presents an evolutionary approach to discover opponent models based Long Short Term Memory neural networks and on Pattern Recognition Trees. Experimental results showed that poker agents built in this method can adapt to opponents they have never seen in training and exploit weak strategies far more effectively than Slumbot 2017, one of the cutting-edge Nash-equilibrium-based poker agents. In addition, agents evolved through playing against relatively weak rule-based opponents tied statistically with Slumbot in heads-up matches. Thus, the proposed approach is a promising new direction for building high-performance adaptive agents in HUNL and other imperfect information games.

[1]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[2]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[3]  Risto Miikkulainen,et al.  Evolving opponent models for Texas Hold 'Em , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[4]  Kevin B. Korb,et al.  Bayesian Poker , 1999, UAI.

[5]  Michael H. Bowling,et al.  Computing Robust Counter-Strategies , 2007, NIPS.

[6]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[7]  Kurt Driessens,et al.  Bayes-Relational Learning of Opponent Models from Incomplete Information in No-Limit Poker , 2008, AAAI.

[8]  Sam Ganzfried Bayesian Opponent Exploitation in Imperfect-Information Games , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[9]  Tuomas Sandholm,et al.  Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[10]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Darse Billings Algorithms and assessment in computer poker , 2006 .

[12]  Jonathan Schaeffer,et al.  Opponent Modeling in Poker , 1998, AAAI/IAAI.

[13]  Tuomas Sandholm,et al.  Safe Opponent Exploitation , 2015, ACM Trans. Economics and Comput..

[14]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[15]  Ian D. Watson,et al.  A statistical exploitation module for Texas Hold'em: And it's benefits when used with an approximate nash equilibrium strategy , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[16]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[17]  Jonathan Schaeffer,et al.  Improved Opponent Modeling in Poker , 2000 .

[18]  Risto Miikkulainen,et al.  Evolving Adaptive Poker Players for Effective Opponent Exploitation , 2017, AAAI Workshops.

[19]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[20]  Tuomas Sandholm,et al.  Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[21]  Omer Ekmekci,et al.  Learning Strategies for Opponent Modeling in Poker , 2013, AAAI 2013.

[22]  Luís Paulo Reis,et al.  Identifying Players Strategies in No Limit Texas Holdém Poker through the Analysis of Individual Moves , 2013, ArXiv.

[23]  Michael H. Bowling,et al.  Online implicit agent modelling , 2013, AAMAS.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Michael H. Bowling,et al.  Eqilibrium Approximation Quality of Current No-Limit Poker Bots , 2016, AAAI Workshops.

[26]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[27]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[28]  Tuomas Sandholm,et al.  Solving two-person zero-sum repeated games of incomplete information , 2008, AAMAS.