相关论文

A Comprehensive Survey of Multiagent Reinforcement Learning

Abstract:Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

摘要:多智能体系统在包括机器人、分布式控制、电信和经济学在内的各个领域都得到了迅速的应用。这些领域中出现的许多任务的复杂性使得它们很难通过预先编程的代理行为来解决。取而代之的是,代理必须通过学习自己找到解决方案。多智能体学习研究的一个重要方面是强化学习技术。本文对多智能体强化学习(MAIL)进行了综述。该领域的一个中心问题是多智能体学习目标的正式陈述。在这个问题上,不同的观点导致了许多不同的目标的提出,其中可以区分两个焦点:代理学习动态的稳定性和对其他代理不断变化的行为的适应。文献中描述的Marl算法的目标-显式或隐式地-在完全合作、完全竞争或更一般的环境中,针对这两个目标之一或两者的组合。本文详细讨论了这些算法中具有代表性的选择,以及在每个类别中出现的具体问题。此外,还描述了MAIL的好处和挑战,以及应用了MAIL技术的一些问题领域。最后,对该领域进行了展望。

参考文献

[1]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[2]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[3]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[6]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[7]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[8]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[9]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[10]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[13]  David Carmel,et al.  Opponent Modeling in Multi-Agent Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[14]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[15]  Maja J. Mataric,et al.  Learning in Multi-Robot Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[16]  Sandip Sen,et al.  Strongly Typed Genetic Programming in Evolving Cooperation Strategies , 1995, ICGA.

[17]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[18]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[19]  Dit-Yan Yeung,et al.  Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[20]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[21]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[22]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[23]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[24]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[25]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[26]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[27]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[28]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[29]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[30]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[31]  Victor R. Lesser,et al.  Learning organizational roles for negotiated search in a multiagent system , 1998, Int. J. Hum. Comput. Stud..

[32]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[33]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[34]  H. Van Dyke Parunak,et al.  Industrial and practical applications of DAI , 1999 .

[35]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[36]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[37]  Jürgen Schmidhuber,et al.  Reinforcement Learning Soccer Teams with Incomplete World Models , 1999, Auton. Robots.

[38]  Geoffrey E. Hinton,et al.  Unsupervised learning : foundations of neural computation , 1999 .

[39]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[40]  Martin A. Riedmiller,et al.  Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids , 2000, Balancing Reactivity and Social Deliberation in Multi-Agent Systems.

[41]  Manuela Veloso,et al.  An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[42]  Gerhard Weiss Industrial and Practical Applications of DAI , 2000 .

[43]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[44]  Claude F. Touzet,et al.  Robot Awareness in Cooperative Mobile Robot Learning , 2000, Auton. Robots.

[45]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[46]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[47]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[48]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[49]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[50]  Klaus Debes,et al.  A reinforcement learning based neural multiagent system for control of a combustion process , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[51]  Guillermo Ricardo Simari,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 2000 .

[52]  Reda Alhajj,et al.  Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[53]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[54]  Von-Wun Soo,et al.  Market Performance of Adaptive Trading Agents in Synchronous Double Auctions , 2001, PRIMA.

[55]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[56]  DeLiang Wang,et al.  Unsupervised Learning: Foundations of Neural Computation , 2001, AI Mag..

[57]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[58]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[59]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[60]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[61]  Bernard Manderick,et al.  Q-Learning in Simulated Robotic Soccer - Large State Spaces and Incomplete Information , 2002, ICMLA.

[62]  Byoung-Tak Zhang,et al.  Stock Trading System Using Reinforcement Learning with Cooperative Agents , 2002, ICML.

[63]  Jae Won Lee,et al.  A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems , 2002, DEXA.

[64]  José M. Vidal,et al.  Learning in Multiagent Systems: An Introduction from a Game-Theoretic Perspective , 2003, Adaptive Agents and Multi-Agents Systems.

[65]  Matthijs T. J. Spaan,et al.  High level coordination of agents based on multiagent Markov decision processes with roles , 2002 .

[66]  Akira Hayashi,et al.  A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[67]  Michael P. Wellman,et al.  The 2001 trading agent competition , 2002, Electron. Mark..

[68]  Michail G. Lagoudakis,et al.  Coordinated Reinforcement Learning , 2002, ICML.

[69]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[70]  Georgios Chalkiadakis Multiagent reinforcement learning: stochastic games with multiple learning players , 2003 .

[71]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed AI , 2003 .

[72]  Yukinori Kakazu,et al.  An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning , 2003, Robotics Auton. Syst..

[73]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[74]  Bikramjit Banerjee,et al.  Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.

[75]  Gerald Tesauro,et al.  Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[76]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[77]  Ville Könönen,et al.  Gradient Based Method for Symmetric and Asymmetric Multiagent Reinforcement Learning , 2003, IDEAL.

[78]  Manuela Veloso,et al.  Multiagent learning in the presence of agents with limitations , 2003 .

[79]  William T. B. Uther,et al.  Adversarial Reinforcement Learning , 2003 .

[80]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[81]  R. Paul Wiegand,et al.  Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[82]  Y. Narahari,et al.  Reinforcement learning applications in dynamic pricing of retail markets , 2003, EEE International Conference on E-Commerce, 2003. CEC 2003..

[83]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[84]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[85]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[86]  Thomas Miconi When Evolving Populations is Better than Coevolving Individuals: The Blind Mice Problem , 2003, IJCAI.

[87]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[88]  Sridhar Mahadevan,et al.  Hierarchical Multiagent Reinforcement Learning , 2004 .

[89]  Daniel Kudenko,et al.  Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[90]  Q. Henry Wu,et al.  Multi-agent learning for routing control within an Internet environment , 2004, Eng. Appl. Artif. Intell..

[91]  Nikos A. Vlassis,et al.  Sparse cooperative Q-learning , 2004, ICML.

[92]  Yoav Shoham,et al.  New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[93]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[94]  Jürgen Schmidhuber,et al.  Learning Team Strategies: Soccer Case Studies , 1998, Machine Learning.

[95]  Jeffrey O. Kephart,et al.  Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[96]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[97]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[98]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[99]  Michael H. Bowling,et al.  Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[100]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[101]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[102]  Felix A. Fischer,et al.  Hierarchical reinforcement learning in communication-mediated multiagent coordination , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[103]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[104]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[105]  Andrew G. Barto,et al.  Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[106]  William D. Smart,et al.  Interpolation-based Q-learning , 2004, ICML.

[107]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[108]  Mohamed S. Kamel,et al.  Learning Coordination Strategies for Cooperative Multiagent Systems , 1998, Machine Learning.

[109]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[110]  Csaba Szepesvári,et al.  Finite time bounds for sampling based fitted value iteration , 2005, ICML.

[111]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[112]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[113]  Robert Fitch,et al.  Structural Abstraction Experiments in Reinforcement Learning , 2005, Australian Conference on Artificial Intelligence.

[114]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[115]  E.H.J. Nijhuis,et al.  Cooperative multi-agent reinforcement learning of traffic lights , 2005 .

[116]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[117]  Nikos A. Vlassis,et al.  Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[118]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[119]  Shin Ishii,et al.  A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game , 2005, Machine Learning.

[120]  Ann Nowé,et al.  Evolutionary game theory and multi-agent reinforcement learning , 2005, The Knowledge Engineering Review.

[121]  Nikos A. Vlassis,et al.  Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.

[122]  Bart De Schutter,et al.  Multiagent Reinforcement Learning with Adaptive State Focus , 2005, BNAIC.

[123]  R. Paul Wiegand,et al.  Biasing Coevolutionary Search for Optimal Multiagent Behaviors , 2006, IEEE Transactions on Evolutionary Computation.

[124]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[125]  Bart De Schutter,et al.  Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[126]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[127]  Olivier Buffet,et al.  Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[128]  Shin Ishii,et al.  Multiagent reinforcement learning applied to a chase problem in a continuous world , 2001, Artificial Life and Robotics.

[129]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[130]  Colin R. Reeves,et al.  Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[131]  Rémi Munos,et al.  Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..

[132]  De,et al.  Relational Reinforcement Learning , 2022 .

引用
Towards optimising modality allocation for multimodal output generation in incremental dialogue
2012
Uncovering demand flexibility in buildings : a smart grid inter-operation framework for the optimization of energy and comfort
2017
A reputation-based framework to support dynamic car-pooling,
Intelligenza Artificiale
2020
Cooperative reinforcement learning for independent learners
2014
Learning to Play: Reinforcement Learning and Games
2020
Accelerated Method based on Reinforcement Learning and Case Base Reasoning in Multi agent Systems
2012
A Study of Recommender Systems Using Markov Decision Process
2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS)
2018
CLEAN Learning to Improve Coordination and Scalability in Multiagent Systems
2013
A Policy Synthesis-Based Framework for Robot Rescue Decision-Making in Multi-Robot Exploration of Disaster Sites
2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)
2018
Kooperierende Mobile Roboter
Autom.
2013
Deception in Social Learning: A Multi-Agent Reinforcement Learning Perspective
ArXiv
2021
Swarmand pheromone based reinforcement learning methods for the robot(s) path search problem
2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES)
2016
Stigmergic Independent Reinforcement Learning for Multiagent Collaboration
IEEE Transactions on Neural Networks and Learning Systems
2019
Multi-agent modeling and simulation in the AI age
2021
Traffic Engineering in Software-defined Networks using Reinforcement Learning: A Review
2021
Cooperative Multi-Agent Reinforcement-Learning-Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks
IEEE Internet of Things Journal
2021
A Novel Network Selection Approach in 5G Heterogeneous Networks Using Q-Learning
2019 26th International Conference on Telecommunications (ICT)
2019
Research on Bidding Strategy of Thermal Power Companies in Electricity Market Based on Multi-Agent Deep Deterministic Policy Gradient
IEEE Access
2021
D2D power control based on supervised and unsupervised learning
2017 3rd IEEE International Conference on Computer and Communications (ICCC)
2017
Entropy Controlled Non-Stationarity for Improving Performance of Independent Learners in Anonymous MARL Settings
ArXiv
2018