StarCraft II: A New Challenge for Reinforcement Learning

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

[1]  Julian Togelius,et al.  Mario AI competition , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[2]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[3]  Philip Hingston,et al.  A Turing Test for Computer Game Bots , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[4]  Julian Togelius,et al.  The 2009 Mario AI Competition , 2010, IEEE Congress on Evolutionary Computation.

[5]  Simon M. Lucas,et al.  Ms Pac-Man versus Ghost Team CEC 2011 competition , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[6]  Julian Togelius,et al.  Measuring Intelligence through Games , 2011, ArXiv.

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Michael Buro,et al.  Real-Time Strategy Game Competitions , 2012, AI Mag..

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Héctor Muñoz-Avila,et al.  CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games , 2012, AIIDE 2012.

[11]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[12]  Tom Schaul,et al.  A video game description language for model-based or interactive learning , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[13]  Glen Robertson,et al.  A Review of Real-Time Strategy Game AI , 2014, AI Mag..

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Michael Buro,et al.  Global State Evaluation in StarCraft , 2014, AIIDE.

[16]  David Silver,et al.  Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[21]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[26]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[27]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[28]  Julian Togelius,et al.  Ieee Transactions on Computational Intelligence and Ai in Games the 2014 General Video Game Playing Competition , 2022 .

[29]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies for StarCraft Micromanagement , 2016, ICLR.

[30]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[31]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[32]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[33]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[34]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[35]  Sebastian Risi,et al.  Learning macromanagement in starcraft from replays using deep learning , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[36]  Shai Rozenberg,et al.  Playing SNES in the Retro Learning Environment , 2016, ICLR.

[37]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[38]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[39]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.