The grand challenge of computer Go

The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, computer Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. However, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo methods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper, we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.

[1]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[2]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[3]  Bruce Abramson,et al.  Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[5]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[6]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[7]  Feng-Hsiung Hsu,et al.  Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[8]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Bruno Bouzy,et al.  Bayesian Generation and Integration of K-nearest-neighbor Patterns for 19x19 Go , 2005, CIG.

[11]  Jos W. H. M. Uiterwijk,et al.  Monte-Carlo tree search in production management problems , 2006 .

[12]  Michael H. Bowling,et al.  Optimal Unbiased Estimators for Evaluating Agent Performance , 2006, AAAI.

[13]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[14]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[15]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[16]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[17]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[18]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[19]  Gabriel Kronberger,et al.  Bandit-Based Monte-Carlo Planning for the Single-Machine Total Weighted Tardiness Scheduling Problem , 2007, EUROCAST.

[20]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[21]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[22]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[23]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[24]  Maarten P. D. Schadd,et al.  Addressing NP-Complete Puzzles with Monte-Carlo Methods 1 , 2008 .

[25]  Olivier Teytaud,et al.  Creating an Upper-Confidence-Tree Program for Havannah , 2009, ACG.

[26]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[27]  Alan Fern,et al.  Lower Bounding Klondike Solitaire with Monte-Carlo Planning , 2009, ICAPS.

[28]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[29]  Martin Müller,et al.  Monte-Carlo Exploration for Deterministic Planning , 2009, IJCAI.

[30]  Michèle Sebag,et al.  Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm , 2009, ECML/PKDD.

[31]  Flavien Balbo,et al.  Monte-Carlo Bus Regulation , 2009 .

[32]  Hideki Imai,et al.  A study on security evaluation methodology for image-based biometrics authentication systems , 2009, 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems.

[33]  Thomas Hérault,et al.  Scalability and Parallelization of Monte-Carlo Tree Search , 2010, Computers and Games.

[34]  Olivier Teytaud,et al.  Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search , 2010, LION.

[35]  Gita Reese Sukthankar,et al.  A Monte Carlo Approach for Football Play Generation , 2010, AIIDE.

[36]  Shimpei Matsumoto,et al.  Monte-Carlo Tree Search for a reentrant scheduling problem , 2010, The 40th International Conference on Computers & Indutrial Engineering.

[37]  Jun Miura,et al.  Observation planning with on-line algorithms and GPU heuristic computation , 2010 .

[38]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[39]  Shih-Chieh Huang,et al.  Monte-Carlo Simulation Balancing in Practice , 2010, Computers and Games.

[40]  Yves Lepage,et al.  The True Score of Statistical Paraphrase Generation , 2010, COLING.

[41]  Olivier Teytaud,et al.  Special Issue on Monte Carlo Techniques and Computer Go , 2010, IEEE Trans. Comput. Intell. AI Games.

[42]  B. Marthi Navigation in Partially Observed Dynamic Roadmaps , 2010 .

[43]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[44]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[45]  CACM Staff,et al.  Is computing science? , 2013, CACM.

[46]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .