论文信息 - Monte-Carlo Go Reinforcement Learning Experiments

Monte-Carlo Go Reinforcement Learning Experiments

This paper describes experiments using reinforcement learning techniques to compute pattern urgencies used during simulations performed in a Monte-Carlo Go architecture. Currently, Monte-Carlo is a popular technique for computer Go. In a previous study, Monte-Carlo was associated with domain-dependent knowledge in the Go-playing program Indigo. In 2003, a 3times3 pattern database was built manually. This paper explores the possibility of using reinforcement learning to automatically tune the 3times3 pattern urgencies. On 9times9 boards, within the Monte-Carlo architecture of Indigo, the result obtained by our automatic learning experiments is better than the manual method by a 3-point margin on average, which is satisfactory. Although the current results are promising on 19times19 boards, obtaining strictly positive results with such a large size remains to be done

Bruno Bouzy | Guillaume Chaslot | Guillaume Chaslot | B. Bouzy

[1] Bruno Bouzy,et al. Monte-Carlo Go Developments , 2003, ACG.

[2] John Hamlen. SEVEN YEAR ITCH , 2004 .

[3] Martin Müller. Position Evaluation in Computer Go , 2002, J. Int. Comput. Games Assoc..

[4] Michael Buro,et al. Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[5] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[6] Martin Müller,et al. Computer Go , 2002, Artif. Intell..

[7] Tapani Raiko. The Go-Playing Program Called Go81 , 2004 .

[8] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[9] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[10] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[13] Martin Müller. Decomposition Search: A Combinatorial Games Approach to Game Tree Search, with Applications to Solving Go Endgames , 1999, IJCAI.

[14] Keh-Hsun Chen. Some Practical Techniques for Global Search in Go , 2000, J. Int. Comput. Games Assoc..

[15] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[16] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[17] Simon M. Lucas,et al. Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go , 2005, IEEE Transactions on Evolutionary Computation.

[18] Jonathan Schaeffer,et al. Solving Checkers , 2005, IJCAI.

[19] Markus Enzenberger,et al. Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.

[20] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21] Risto Miikkulainen,et al. Evolving a Roving Eye for Go , 2004, GECCO.

[22] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[23] Brian Sheppard,et al. World-championship-caliber Scrabble , 2002, Artif. Intell..

[24] Jonathan Schaeffer,et al. Games, computers, and artificial intelligence , 2002, Artif. Intell..

[25] Bruno Bouzy,et al. Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[26] Tristan Cazenave,et al. Abstract Proof Search , 2000, Computers and Games.

[27] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .

[28] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[29] Bruce Abramson,et al. Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[30] H. Jaap van den Herik,et al. Learning to score final positions in the game of Go , 2003, Theor. Comput. Sci..

[31] Bruno Bouzy,et al. Associating Shallow and Selective Global Tree Search with Monte Carlo for 9*9 Go , 2004, Computers and Games.

[32] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..

[33] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..

[35] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[36] H. Jaap van den Herik,et al. Games solved: Now and in the future , 2002, Artif. Intell..

[37] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[38] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[39] Jonathan Schaeffer,et al. One jump ahead - challenging human supremacy in checkers , 1997, J. Int. Comput. Games Assoc..