Simulation-Based Approach to General Game Playing

The aim of General Game Playing (GGP) is to create intelligent agents that automatically learn how to play many different games at an expert level without any human intervention. The most successful GGP agents in the past have used traditional game-tree search combined with an automatically learned heuristic function for evaluating game states. In this paper we describe a GGP agent that instead uses a Monte Carlo/UCT simulation technique for action selection, an approach recently popularized in computer Go. Our GGP agent has proven its effectiveness by winning last year s AAAI GGP Competition. Furthermore, we introduce and empirically evaluate a new scheme for automatically learning search-control knowledge for guiding the simulation playouts, showing that it offers significant benefits for a variety of games.

[1]  Jonathan Schaeffer,et al.  The History Heuristic and Alpha-Beta Search Enhancements in Practice , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  M. R. Genesereth,et al.  Knowledge Interchange Format Version 3.0 Reference Manual , 1992, LICS 1992.

[3]  Alexander Reinefeld,et al.  Enhanced Iterative-Deepening Search , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  B. Pell A STRATEGIC METAGAME PLAYER FOR GENERAL CHESS‐LIKE GAMES , 1994, Comput. Intell..

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[7]  Bikramjit Banerjee and Gregory Kuhlmann and Peter Stone Value Function Transfer for General Game Playing , 2006 .

[8]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[9]  Peter Stone,et al.  Automatic Heuristic Construction in a Complete General Game Player , 2006, AAAI.

[10]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[11]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[12]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[13]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[14]  Hilmar Finnsson,et al.  CADIA-Player : a general game playing agent , 2007 .

[15]  Stephan Schiffel,et al.  Fluxplayer: A Successful General Game Player , 2007, AAAI.

[16]  James E. Clune,et al.  Heuristic Evaluation Functions for General Game Playing , 2007, KI - Künstliche Intelligenz.