The Hanabi Challenge: A New Frontier for AI Research

Abstract From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.

[1]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[2]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[3]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[4]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[5]  Michael C. Frank,et al.  Predicting Pragmatic Reasoning in Language Games , 2012, Science.

[6]  James Goodman,et al.  Re-determinizing Information Set Monte Carlo Tree Search in Hanabi , 2019, ArXiv.

[7]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8]  Kevin Leyton-Brown,et al.  Predicting human behavior in unrepeated, simultaneous-move games , 2013, Games Econ. Behav..

[9]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[11]  S. Brison The Intentional Stance , 1989 .

[12]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[13]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[14]  Lasse Becker-Czarnetzki Report on DeepStack Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker , 2019 .

[15]  Siobhan Chapman Logic and Conversation , 2005 .

[16]  Steven T. Piantadosi,et al.  The communicative function of ambiguity in language , 2011, Cognition.

[17]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[18]  Michael H. Bowling,et al.  Coordination and Adaptation in Impromptu Teams , 2005, AAAI.

[19]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[20]  Maruan Al-Shedivat,et al.  Learning Policy Representations in Multiagent Systems , 2018, ICML.

[21]  Kevin Leyton-Brown,et al.  Deep Learning for Predicting Human Strategic Behavior , 2016, NIPS.

[22]  C. Cox,et al.  How to Make the Perfect Fireworks Display: Two Strategies for Hanabi , 2015 .

[23]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[24]  Bruno Bouzy,et al.  Playing Hanabi Near-Optimally , 2017, ACG.

[25]  D. Premack,et al.  Does the chimpanzee have a theory of mind? , 1978, Behavioral and Brain Sciences.

[26]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[27]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[28]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[29]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[30]  Joel Veness,et al.  Bootstrapping from Game Tree Search , 2009, NIPS.

[31]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[32]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[33]  Marcel Roeloffzen,et al.  Hanabi is NP-hard, even for cheaters who look at their cards , 2016, Theor. Comput. Sci..

[34]  Hugo Larochelle,et al.  GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[36]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[37]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[38]  Jonathan Schaeffer,et al.  CHINOOK: The World Man-Machine Checkers Champion , 1996, AI Mag..

[39]  Giacomo Bonanno,et al.  Epistemic Foundations of Game Theory , 2012 .

[40]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[41]  Peter Vrancx,et al.  Game Theory and Multi-agent Reinforcement Learning , 2012, Reinforcement Learning.

[42]  Ross A. Knepper,et al.  Implicit Communication of Actionable Information in Human-AI teams , 2019, CHI.

[43]  Chris Martens,et al.  I See What You See: Integrating Eye Tracking into Hanabi Playing Agents , 2018, AIIDE Workshops.

[44]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[45]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[46]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[47]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[48]  R. Kirk CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[49]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[50]  Chris Martens,et al.  Practical Specification of Belief Manipulation in Games , 2017, AIIDE.

[51]  Raymond J. Dolan,et al.  Game Theory of Mind , 2008, PLoS Comput. Biol..

[52]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[53]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[54]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[55]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[56]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[57]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[58]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[59]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[60]  Andrew Tridgell,et al.  KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[61]  Hsuan-Tien Lin,et al.  Automatic Bridge Bidding Using Deep Reinforcement Learning , 2016, IEEE Transactions on Games.

[62]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2018, Autonomous Agents and Multi-Agent Systems.

[63]  Hirotaka Osawa,et al.  Solving Hanabi: Estimating Hands by Opponent's Actions in Cooperative Game with Incomplete Information , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[64]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[65]  Peter Stone,et al.  Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..

[66]  Neil Burch,et al.  Time and Space: Why Imperfect Information Games are Hard , 2018 .

[67]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[68]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[69]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[70]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[71]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[72]  Simon M. Lucas,et al.  Evaluating and modelling Hanabi-playing agents , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[73]  Julian Togelius,et al.  Evolving Agents for the Hanabi 2018 CIG Competition , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[74]  M. Tomasello,et al.  Does the chimpanzee have a theory of mind? 30 years later , 2008, Trends in Cognitive Sciences.

[75]  Michael H. Bowling,et al.  Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[76]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[77]  Walter A. Kosters,et al.  Aspects of the Cooperative Card Game Hanabi , 2016, BNCAI.

[78]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[79]  H. Francis Song,et al.  Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[80]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[81]  Manuela M. Veloso,et al.  Simultaneous Adversarial Multi-Robot Learning , 2003, IJCAI.

[82]  J. Sobel,et al.  STRATEGIC INFORMATION TRANSMISSION , 1982 .

[83]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[84]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.