Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients

Developing superior artificial board-game players is a widely-studied area of Artificial Intelligence. Among the most challenging games is the Asian game of Go, which, despite its deceivingly simple rules, has eluded the development of artificial expert players. In this paper we attempt to tackle this challenge through a combination of two recent developments in Machine Learning. We employ Multi-Dimensional Recurrent Neural Networks with Long Short-Term Memory cells to handle the multi-dimensional data of the board game in a very natural way. In order to improve the convergence rate, as well as the ultimate performance, we train those networks using Policy Gradients with Parameter-based Exploration, a recently developed Reinforcement Learning algorithm which has been found to have numerous advantages over Evolution Strategies. Our empirical results confirm the promise of this approach, and we discuss how it can be scaled up to expert-level Go players.

[1]  Frank Sehnke,et al.  Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[2]  Aapo Hyvärinen,et al.  Learning Features by Contrasting Natural Images with Noise , 2009, ICANN.

[3]  Tom Schaul,et al.  Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.

[4]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[5]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[6]  Lin Wu,et al.  A Scalable Machine Learning Approach to Go , 2006, NIPS.

[7]  Nir Oren,et al.  Evolving Neural Networks for the Capture Game , 2002 .

[8]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[9]  Véra Kůrková,et al.  Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I , 2008, ICANN.

[10]  Tom Schaul,et al.  A scalable neural network architecture for board games , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[11]  Bruno Bouzy,et al.  Monte-Carlo Go Reinforcement Learning Experiments , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[12]  Tom Schaul,et al.  Scalable Neural Networks for Board Games , 2009, ICANN.

[13]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[14]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[15]  Hans-Paul Schwefel,et al.  Evolution and Optimum Seeking: The Sixth Generation , 1993 .

[16]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[17]  Risto Miikkulainen,et al.  Evolving a Roving Eye for Go , 2004, GECCO.

[18]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .