The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)

In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.

[1]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[2]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[3]  R. Lathe Phd by thesis , 1988, Nature.

[4]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[5]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[6]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.

[7]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Benjamin Kuipers,et al.  Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..

[10]  David L. Dowe,et al.  A Non-Behavioural, Computational Extension to the Turing Test , 1998 .

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[13]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[14]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[17]  Risto Miikkulainen,et al.  Coevolution of neural networks using a layered pareto archive , 2006, GECCO.

[18]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[19]  S. Legg Machine super intelligence , 2008 .

[20]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[21]  B. Kuipers,et al.  From pixels to policies: A bootstrapping agent , 2008, 2008 7th IEEE International Conference on Development and Learning.

[22]  Nick Montfort,et al.  Racing the Beam: The Atari Video Computer System , 2009 .

[23]  Barney Pell,et al.  Strategy Generation and Evaluation for Meta-Game Playing , 2011, KI - Künstliche Intelligenz.

[24]  Yavar Naddaf,et al.  Game-independent AI agents for playing Atari 2600 console games , 2010 .

[25]  Shimon Whiteson,et al.  The Reinforcement Learning Competitions , 2010 .

[26]  José Hernández-Orallo,et al.  Measuring universal intelligence: Towards an anytime intelligence test , 2010, Artif. Intell..

[27]  Samuel Wintermute,et al.  Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents , 2010, AAAI.

[28]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[29]  Andrea Lockerd Thomaz,et al.  Automatic State Abstraction from Demonstration , 2011, IJCAI.

[30]  Shane Legg,et al.  An Approximation of the Universal Intelligence Measure , 2011, Algorithmic Probability and Friends.

[31]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[32]  Julian Togelius,et al.  Measuring Intelligence through Games , 2011, ArXiv.

[33]  Risto Miikkulainen,et al.  HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[34]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[35]  Scott Sanner,et al.  A Survey of the Seventh International Planning Competition , 2012, AI Mag..

[36]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[37]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[38]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.