论文信息 - An Introduction to Counterfactual Regret Minimization

An Introduction to Counterfactual Regret Minimization

In 2000, Hart and Mas-Colell introduced the important game-theoretic algorithm of regret matching. Players reach equilibrium play by tracking regrets for past plays, making future plays proportional to positive regrets. The technique is not only simple and intuitive; it has sparked a revolution in computer game play of some of the most difficult bluffing games, including clear domination of annual computer poker competitions. Since the algorithm is relatively recent, there are few curricular materials available to introduce regret-based algorithms to the next generation of researchers and practitioners in this area. These materials represent a modest first step towards making recent innovations more accessible to advanced Computer Science undergraduates, graduate students, interested researchers, and ambitious practitioners. In Section 2, we introduce the concept of player regret, describe the regret-matching algorithm, present a rock-paper-scissors worked example in the literate programming style, and suggest related exercises. Counterfactual Regret Minimization (CFR) is introduced in Section 3 with a worked example solving Kuhn Poker. Supporting code is provided for a substantive CFR exercise computing optimal play for 1-die-versus-1-die Dudo. In Section 4, we briefly mention means of “cleaning” approximately optimal computed policies, which can in many cases improve results. Section 5 covers an advanced application of CFR to games with repeated states (e.g. through imperfect recall abstraction) that can reduce computational complexity of a CFR training iteration from exponential to linear. Here, we use our independently devised game of Liar Die to demonstrate application of the algorithm. We then suggest that the reader apply the technique to 1-die-versus-1-die Dudo with a memory of 3 claims. In Section 6, we briefly discuss an open research problem: Among possible equilibrium strategies, how do we compute one that optimally exploits opponent errors? The reader is invited to modify our Liar Die example code to so as to gain insight to this interesting problem. Finally, in Section 7, we suggest further challenge problems and paths for continued learning.

Todd W. Neller | Marc Lanctot | Marc Lanctot | T. Neller

[1] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[2] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[3] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[4] Q. Zeng,et al. REVERSE AUCTION: THE LOWEST UNIQUE POSITIVE INTEGER GAME , 2007 .

[5] Yoav Shoham,et al. Essentials of Game Theory: A Concise Multidisciplinary Introduction , 2008, Essentials of Game Theory: A Concise Multidisciplinary Introduction.

[6] Michael H. Bowling,et al. Data Biased Robust Counter Strategies , 2009, AISTATS.

[7] Kevin Waugh,et al. A Practical Use of Imperfect Recall , 2009, SARA.

[8] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[9] Peter Bro Miltersen,et al. Computing a quasi-perfect equilibrium of a two-player game , 2010 .

[10] Reiner Knizia. Dice Games Properly Explained , 2010 .

[11] Todd W. Neller,et al. Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization , 2011, ACG.

[12] Michael H. Bowling,et al. No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[13] Kevin Waugh,et al. Strategy purification and thresholding: effective non-equilibrium approaches for playing large games , 2012, AAMAS.

[14] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .