论文信息 - Effective short-term opponent exploitation in simplified poker

Effective short-term opponent exploitation in simplified poker

Uncertainty in poker stems from two key sources, the shuffled deck and an adversary whose strategy is unknown. One approach to playing poker is to find a pessimistic game-theoretic solution (i.e., a Nash equilibrium), but human players have idiosyncratic weaknesses that can be exploited if some model or counter-strategy can be learned by observing their play. However, games against humans last for at most a few hundred hands, so learning must be very fast to be useful. We explore two approaches to opponent modelling in the context of Kuhn poker, a small game for which game-theoretic solutions are known. Parameter estimation and expert algorithms are both studied. Experiments demonstrate that, even in this small game, convergence to maximally exploitive solutions in a small number of hands is impractical, but that good (e.g., better than Nash) performance can be achieved in as few as 50 hands. Finally, we show that amongst a set of strategies with equal game-theoretic value, in particular the set of Nash equilibrium strategies, some are preferable because they speed learning of the opponent’s strategy by exploring it more effectively.

[1] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[2] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[3] T. Koopmans,et al. Activity Analysis of Production and Allocation. , 1952 .

[4] The Theory of , 1962 .

[5] R. M. Burstall,et al. Advances in programming and non-numerical computation , 1967, The Mathematical Gazette.

[6] D. Michie. GAME-PLAYING AND GAME-LEARNING AUTOMATA , 1966 .

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[9] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[10] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[11] Avi Pfeffer,et al. Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[12] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[13] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[14] Kevin B. Korb,et al. Bayesian Poker , 1999, UAI.

[15] Michael L. Littman,et al. Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[16] Thomas G. Dietterich,et al. Improved Class Probability Estimates from Decision Tree Models , 2003 .

[17] David D. Denison,et al. Nonlinear estimation and classification , 2003 .

[18] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[19] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[20] Martin A. Zinkevich,et al. Theoretical guarantees for algorithms in multi-agent settings , 2004 .

[21] Yoav Shoham,et al. New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[22] Jonathan Schaeffer,et al. Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games , 2004, Computers and Games.

[23] No-Regret Algorithms for Structured Prediction Problems , 2005 .

[24] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[25] Marcus Hutter,et al. Universal Learning of Repeated Matrix Games , 2005, ArXiv.

[26] Tuomas Sandholm,et al. Optimal Rhode Island Hold'em Poker , 2005, AAAI.

[27] No-regret algorithms for structured prediction problems — DRAFT , 2005 .

[28] J. M. Bilbao,et al. Contributions to the Theory of Games , 2005 .

[29] Tuomas Sandholm,et al. A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.

[30] Bret Hoehn,et al. The Effectiveness of Opponent Modelling in a Small Imperfect Information Game , 2006 .

[31] Tuomas Sandholm,et al. Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[32] Michael H. Bowling,et al. A New Algorithm for Generating Equilibria in Massive Zero-Sum Games , 2007, AAAI.

[33] Tuomas Sandholm,et al. Lossless abstraction of imperfect information games , 2007, JACM.

[34] Troels Bjerre Lund,et al. Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker , 2007, AAAI.

[35] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.