Hierarchical Multiagent Reinforcement Learning

Abstract : In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multiagent tasks. We introduce a hierarchical multiagent reinforcement learning (RL) framework and propose a hierarchical multiagent RL algorithm called Cooperative HRL. In our approach, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Since coordination at high levels allows for increased cooperation skills as agents do not get confused by low-level details, we usually define cooperative subtasks at the high levels of the hierarchy.

[1]  Ronald A. Howard,et al.  Dynamic Probabilistic Systems , 1971 .

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  Charles R. Standridge,et al.  Modeling and Analysis of Manufacturing Systems , 1993 .

[4]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Andrew W. Moore,et al.  An Introduction to Reinforcement Learning , 1995 .

[7]  Jim Lee Composite Dispatching Rules for Multiple-Vehicle AGV Systems , 1996, Simul..

[8]  J. Filar,et al.  Competitive Markov Decision Processes , 1996 .

[9]  Prasad Tadepalli,et al.  Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[10]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[11]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[12]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13]  Tucker R. Balch,et al.  Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[16]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[17]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[18]  Michael Wooldridge,et al.  Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 1999 .

[19]  Gang Wang,et al.  Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.

[20]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[21]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[22]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[23]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[24]  Pierfrancesco La Mura Game Networks , 2000, UAI.

[25]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[26]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[27]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[28]  Victor R. Lesser,et al.  Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[29]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[30]  Luis E. Ortiz,et al.  Nash Propagation for Loopy Graphical Games , 2002, NIPS.

[31]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[32]  Satinder Singh,et al.  An Efficient Exact Algorithm for Singly Connected Graphical Games , 2002, NIPS 2002.

[33]  Sridhar Mahadevan,et al.  Learning to Take Concurrent Actions , 2002, NIPS.

[34]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002 .

[35]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[36]  Daphne Koller,et al.  Multi-agent algorithms for solving graphical games , 2002, AAAI/IAAI.

[37]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[38]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[39]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[40]  Suchi Saria,et al.  Probabilistic Plan Recognition in Multiagent Systems , 2004, ICAPS.

[41]  Victor R. Lesser,et al.  Learning to Improve Coordinated Actions in Cooperative Distributed Problem-Solving Environments , 1998, Machine Learning.

[42]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[43]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..