Shaping multi-agent systems with gradient reinforcement learning

An original reinforcement learning (RL) methodology is proposed for the design of multi-agent systems. In the realistic setting of situated agents with local perception, the task of automatically building a coordinated system is of crucial importance. To that end, we design simple reactive agents in a decentralized way as independent learners. But to cope with the difficulties inherent to RL used in that framework, we have developed an incremental learning algorithm where agents face a sequence of progressively more complex tasks. We illustrate this general framework by computer experiments where agents have to coordinate to reach a global goal.

[1]  B. Skinner,et al.  Science and human behavior , 1953 .

[2]  Christopher M. Bishop,et al.  Advances in Neural Information Processing Systems 8 (NIPS 1995) , 1991 .

[3]  Toby Tyrrell,et al.  Computational mechanisms for action selection , 1993 .

[4]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[7]  David Carmel,et al.  Opponent Modeling in Multi-Agent Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[8]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[9]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[10]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[11]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[12]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[15]  Edmund H. Durfee,et al.  Agents Learning about Agents: A Framework and Analysis , 1997 .

[16]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[17]  Michael P. Wellman,et al.  Online learning about other agents in a dynamic multiagent system , 1998, AGENTS '98.

[18]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[19]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[20]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[21]  Kagan Tumer,et al.  General principles of learning-based multi-agent systems , 1999, AGENTS '99.

[22]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[23]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[24]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[25]  Kagan Tumer,et al.  An Introduction to Collective Intelligence , 1999, ArXiv.

[26]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[27]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[28]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[29]  Victor R. Lesser,et al.  Communication in multi-agent Markov decision processes , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[30]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[31]  Kagan Tumer,et al.  Collective Intelligence and Braess' Paradox , 2000, AAAI/IAAI.

[32]  Alain Dutech,et al.  Solving POMDPs Using Selected Past Events , 2000, ECAI.

[33]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[34]  Jette Randløv,et al.  Shaping in Reinforcement Learning by Changing the Physics of the Problem , 2000, ICML.

[35]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[36]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[37]  Learning in Large Cooperative Multi-Robot Domains , 2001 .

[38]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[39]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[40]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[41]  Eduardo F. Camacho,et al.  Adaption and learning , 2003 .

[42]  Claudia V. Goldman,et al.  Decentralized language learning through acting , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[43]  Brian P. Gerkey,et al.  A Formal Analysis and Taxonomy of Task Allocation in Multi-Robot Systems , 2004, Int. J. Robotics Res..

[44]  C. Rogowski Model-based Opponent Modelling in Domains Beyond the Prisoner's Dilemma , 2004 .

[45]  Jürgen Schmidhuber,et al.  Learning Team Strategies: Soccer Case Studies , 1998, Machine Learning.

[46]  Olivier Buffet,et al.  Self-Growth of Basic Behaviors in an Action Selection Based Agent , 2004 .

[47]  G. DeJong,et al.  Theory and Application of Reward Shaping in Reinforcement Learning , 2004 .

[48]  Prashant Doshi,et al.  Interactive POMDPs: properties and preliminary results , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[49]  Olivier Buffet,et al.  Développement autonome des comportements de base d'un agent , 2005, Rev. d'Intelligence Artif..

[50]  Luís Torgo,et al.  Machine Learning: ECML 2005, 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings , 2005, ECML.

[51]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[52]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[53]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[54]  O. Buffet The Factored Policy Gradient planner ( IPC-06 Version ) , 2006 .