Monte-Carlo Go Reinforcement Learning Experiments

This paper describes experiments using reinforcement learning techniques to compute pattern urgencies used during simulations performed in a Monte-Carlo Go architecture. Currently, Monte-Carlo is a popular technique for computer Go. In a previous study, Monte-Carlo was associated with domain-dependent knowledge in the Go-playing program Indigo. In 2003, a 3times3 pattern database was built manually. This paper explores the possibility of using reinforcement learning to automatically tune the 3times3 pattern urgencies. On 9times9 boards, within the Monte-Carlo architecture of Indigo, the result obtained by our automatic learning experiments is better than the manual method by a 3-point margin on average, which is satisfactory. Although the current results are promising on 19times19 boards, obtaining strictly positive results with such a large size remains to be done

[1]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[2]  John Hamlen SEVEN YEAR ITCH , 2004 .

[3]  Martin Müller Position Evaluation in Computer Go , 2002, J. Int. Comput. Games Assoc..

[4]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[5]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[6]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[7]  Tapani Raiko The Go-Playing Program Called Go81 , 2004 .

[8]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[9]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[13]  Martin Müller Decomposition Search: A Combinatorial Games Approach to Game Tree Search, with Applications to Solving Go Endgames , 1999, IJCAI.

[14]  Keh-Hsun Chen Some Practical Techniques for Global Search in Go , 2000, J. Int. Comput. Games Assoc..

[15]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[16]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[17]  Simon M. Lucas,et al.  Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go , 2005, IEEE Transactions on Evolutionary Computation.

[18]  Jonathan Schaeffer,et al.  Solving Checkers , 2005, IJCAI.

[19]  Markus Enzenberger,et al.  Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21]  Risto Miikkulainen,et al.  Evolving a Roving Eye for Go , 2004, GECCO.

[22]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[23]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[24]  Jonathan Schaeffer,et al.  Games, computers, and artificial intelligence , 2002, Artif. Intell..

[25]  Bruno Bouzy,et al.  Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[26]  Tristan Cazenave,et al.  Abstract Proof Search , 2000, Computers and Games.

[27]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[28]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[29]  Bruce Abramson,et al.  Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  H. Jaap van den Herik,et al.  Learning to score final positions in the game of Go , 2003, Theor. Comput. Sci..

[31]  Bruno Bouzy,et al.  Associating Shallow and Selective Global Tree Search with Monte Carlo for 9*9 Go , 2004, Computers and Games.

[32]  Bruno Bouzy,et al.  Computer Go: An AI oriented survey , 2001, Artif. Intell..

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[35]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[36]  H. Jaap van den Herik,et al.  Games solved: Now and in the future , 2002, Artif. Intell..

[37]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[38]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[39]  Jonathan Schaeffer,et al.  One jump ahead - challenging human supremacy in checkers , 1997, J. Int. Comput. Games Assoc..