Game-independent AI agents for playing Atari 2600 console games

This research focuses on developing AI agents that play arbitrary Atari 2600 console games without having any game-specific assumptions or prior knowledge. Two main approaches are considered: reinforcement learning based methods and search based methods. The RL-based methods use feature vectors generated from the game screen as well as the console RAM to learn to play a given game. The search-based methods use the emulator to simulate the consequence of actions into the future, aiming to play as well as possible by only exploring a very small fraction of the state-space. To insure the generic nature of our methods, all agents are designed and tuned using four specific games. Once the development and parameter selection is complete, the performance of the agents is evaluated on a set of 50 randomly selected games. Significant learning is reported for the RL-based methods on most games. Additionally, some instances of human-level performance is achieved by the search-based methods.

[1]  David E. Wilkins,et al.  Domain-Independent Planning: Representation and Plan Generation , 1984, Artif. Intell..

[2]  Berthold K. P. Horn Robot vision , 1986, MIT electrical engineering and computer science series.

[3]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[4]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[5]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[6]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[7]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[8]  John E. Laird,et al.  Human-Level AI's Killer Application: Interactive Computer Games , 2000, AI Mag..

[9]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[10]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[11]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[12]  Gerald DeJong,et al.  The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping , 2003, ICML.

[13]  James G. Conley,et al.  Use of a Game Over: Emulation and the Video Game Industry, A White Paper , 2004 .

[14]  Michael Buro,et al.  Call for AI Research in RTS Games , 2004 .

[15]  Marc J. V. Ponsen,et al.  Improving Adaptive Game Ai with Evolutionary Learning , 2004 .

[16]  Robert Givan,et al.  Relational Reinforcement Learning: An Overview , 2004, ICML 2004.

[17]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[18]  Patrik Haslum,et al.  New Admissible Heuristics for Domain-Independent Planning , 2005, AAAI.

[19]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[20]  Jonathan Schaeffer,et al.  Monte Carlo Planning in RTS Games , 2005, CIG.

[21]  Jonathan Schaeffer,et al.  Solving Checkers , 2005, IJCAI.

[22]  Eric O. Postma,et al.  Adaptive game AI with dynamic scripting , 2006, Machine Learning.

[23]  M. Shah,et al.  Object tracking: A survey , 2006, CSUR.

[24]  P.H.M. Spronck,et al.  Towards relational hierarchical reinforcement learning in computer games , 2007 .

[25]  Hector Muñoz-Avila,et al.  RETALIATE: Learning Winning Policies in First-Person Shooter Games , 2007, AAAI.

[26]  M. Loth,et al.  Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[27]  Richard Alan Peters,et al.  Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-world Environment , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[28]  Sylvain Gelly,et al.  Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[29]  Duane Szafron,et al.  Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games , 2008, AIIDE.

[30]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[31]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[32]  David Wettergreen,et al.  Control strategies for a multi-legged hopping robot , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Marcus Gallagher,et al.  Learning to be a Bot: Reinforcement Learning in Shooter Games , 2008, AIIDE.

[34]  Alan Fern,et al.  UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.

[35]  Shie Mannor,et al.  Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.

[36]  Simon M. Lucas,et al.  Computational Intelligence and AI in Games: A New IEEE Transactions , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[37]  J. Schaeffer,et al.  Comparing UCT versus CFR in Simultaneous Games , 2009 .

[38]  Kevin Waugh,et al.  A demonstration of the Polaris poker system , 2009, AAMAS.

[39]  Nick Montfort,et al.  Racing the Beam: The Atari Video Computer System , 2009 .

[40]  James E. Clune,et al.  Heuristic Evaluation Functions for General Game Playing , 2007, KI - Künstliche Intelligenz.