论文信息 - Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks. In these tasks, the agents are not given any pre-designed communication protocol. Therefore, in order to successfully communicate, they must first automatically develop and agree upon their own communication protocol. We present empirical results on two multi-agent learning problems based on well-known riddles, demonstrating that DDRQN can successfully solve such tasks and discover elegant communication protocols to do so. To our knowledge, this is the first time deep reinforcement learning has succeeded in learning communication protocols. In addition, we present ablation experiments that confirm that each of the main components of the DDRQN architecture are critical to its success.

[1] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[7] Wolfram Burgard,et al. A Probabilistic Approach to Collaborative Multi-Robot Localization , 2000, Auton. Robots.

[8] C. Lee Giles,et al. Learning Communication for Multi-agent Systems , 2002, WRAC.

[9] Brian P. Gerkey,et al. A Formal Analysis and Taxonomy of Task Allocation in Multi-Robot Systems , 2004, Int. J. Robotics Res..

[10] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[11] Alberto RibesAbstract,et al. Multi agent systems , 2019, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[12] Nikos A. Vlassis,et al. Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[13] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[14] Reza Olfati-Saber,et al. Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[15] A. Kamiya,et al. Learning of communication codes in multi-agent reinforcement learning problem , 2008, 2008 IEEE Conference on Soft Computing in Industrial Applications.

[16] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[17] 100 PRISONERS AND A LIGHT BULB , 2009 .

[18] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[19] Francisco S. Melo,et al. QueryPOMDP: POMDP-Based Communication in Multiagent Systems , 2011, EUMAS.

[20] David McGuire. Are you Smart Enough to Work at Google: Fiendish Puzzles and Impossible Interview Questions from the World's Top Companies , 2013 .

[21] Wenwu Yu,et al. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[22] Victor R. Lesser,et al. Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[23] Javier de Lope Asiaín,et al. Coordination of communication in robot teams by reinforcement learning , 2013, Robotics Auton. Syst..