论文信息 - A Comprehensive Survey of Multiagent Reinforcement Learning

Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems

Multiagent systems (MASs) have received extensive attention in a variety of domains, such as robotics and distributed control. This paper focuses on how independent learners (ILs, structures used in decentralized reinforcement learning) decide on their individual behaviors to achieve coherent joint behavior. To date, Reinforcement learning(RL) approaches for ILs have not guaranteed convergence to the optimal joint policy in scenarios in which communication is difficult. Especially in a decentralized algorithm, the proportion of credit for a single agent’s action in a multiagent system is not distinguished, which can lead to miscoordination of joint actions. Therefore, it is highly significant to study the mechanisms of coordination between agents in MASs. Most previous coordination mechanisms have been carried out by modeling the communication mechanism and other agent policies. These methods are applicable only to a particular system, so such algorithms do not offer generalizability, especially when there are dozens or more agents. Therefore, this paper mainly focuses on the MAS contains more than a dozen agents. By combining the method of parallel computation, the experimental environment is closer to the application scene. By studying the paradigm of centralized training and decentralized execution(CTDE), a multi-agent reinforcement learning algorithm for implicit coordination based on TD error is proposed. The new algorithm can dynamically adjust the learning rate by deeply analyzing the dissonance problem in the matrix game and combining it with a multiagent environment. By adjusting the dynamic learning rate between agents, coordination of the agents’ strategies can be achieved. Experimental results show that the proposed algorithm can effectively improve the coordination ability of a MAS. Moreover, the variance of the training results is more stable than that of the hysteretic Q learning(HQL) algorithm. Hence, the problem of miscoordination in a MAS can be avoided to some extent without additional communication. Our work provides a new way to solve the miscoordination problem for reinforcement learning algorithms in the scale of dozens or more number of agents. As a new IL structure algorithm, our results should be extended and further studied.

A Comprehensive Survey of Multiagent Reinforcement Learning

Bart De Schutter

Lucian Buşoniu

Robert Babuška

L. Buşoniu

B. Schutter

Robert Babuška

Abstract:Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

摘要：多智能体系统在包括机器人、分布式控制、电信和经济学在内的各个领域都得到了迅速的应用。这些领域中出现的许多任务的复杂性使得它们很难通过预先编程的代理行为来解决。取而代之的是，代理必须通过学习自己找到解决方案。多智能体学习研究的一个重要方面是强化学习技术。本文对多智能体强化学习(MAIL)进行了综述。该领域的一个中心问题是多智能体学习目标的正式陈述。在这个问题上，不同的观点导致了许多不同的目标的提出，其中可以区分两个焦点：代理学习动态的稳定性和对其他代理不断变化的行为的适应。文献中描述的Marl算法的目标-显式或隐式地-在完全合作、完全竞争或更一般的环境中，针对这两个目标之一或两者的组合。本文详细讨论了这些算法中具有代表性的选择，以及在每个类别中出现的具体问题。此外，还描述了MAIL的好处和挑战，以及应用了MAIL技术的一些问题领域。最后，对该领域进行了展望。

参考文献

[1] J M Smith,et al. Evolution and the theory of games , 1976 .

[2] T. Başar,et al. Dynamic Noncooperative Game Theory , 1982 .

[3] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[5] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[6] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[7] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[8] Kenneth A. De Jong,et al. A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[9] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[10] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.

[13] David Carmel,et al. Opponent Modeling in Multi-Agent Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[14] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[15] Maja J. Mataric,et al. Learning in Multi-Robot Systems , 1995, Adaption and Learning in Multi-Agent Systems.

[16] Sandip Sen,et al. Strongly Typed Genetic Programming in Evolving Cooperation Strategies , 1995, ICGA.

[17] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[18] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[19] Dit-Yan Yeung,et al. Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control , 1995, NIPS.

[20] Moshe Tennenholtz,et al. Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[21] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[22] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[23] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[24] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[25] Juergen Schmidhuber,et al. A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[26] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[27] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[28] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[29] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[30] T. Başar,et al. Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[31] Victor R. Lesser,et al. Learning organizational roles for negotiated search in a multiagent system , 1998, Int. J. Hum. Comput. Stud..

[32] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[33] Sandip Sen,et al. Learning in multiagent systems , 1999 .

[34] H. Van Dyke Parunak,et al. Industrial and practical applications of DAI , 1999 .

[35] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[36] Craig Boutilier,et al. Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[37] Jürgen Schmidhuber,et al. Reinforcement Learning Soccer Teams with Incomplete World Models , 1999, Auton. Robots.

[38] Geoffrey E. Hinton,et al. Unsupervised learning : foundations of neural computation , 1999 .

[39] Manuela M. Veloso,et al. Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[40] Martin A. Riedmiller,et al. Reinforcement Learning for Cooperating and Communicating Reactive Agents in Electrical Power Grids , 2000, Balancing Reactivity and Social Deliberation in Multi-Agent Systems.

[41] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[42] Gerhard Weiss. Industrial and Practical Applications of DAI , 2000 .

[43] Marco Wiering,et al. Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[44] Claude F. Touzet,et al. Robot Awareness in Cooperative Mobile Robot Learning , 2000, Auton. Robots.

[45] Michael H. Bowling,et al. Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[46] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[47] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[48] Martin A. Riedmiller,et al. Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[49] Jordan B. Pollack,et al. A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[50] Klaus Debes,et al. A reinforcement learning based neural multiagent system for control of a combustion process , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[51] Guillermo Ricardo Simari,et al. Multiagent systems: a modern approach to distributed artificial intelligence , 2000 .

[52] Reda Alhajj,et al. Multiagent reinforcement learning using function approximation , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[53] Peter Stone,et al. Implicit Negotiation in Repeated Games , 2001, ATAL.

[54] Von-Wun Soo,et al. Market Performance of Adaptive Trading Agents in Synchronous Double Auctions , 2001, PRIMA.

[55] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[56] DeLiang Wang,et al. Unsupervised Learning: Foundations of Neural Computation , 2001, AI Mag..

[57] Michael L. Littman,et al. Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[58] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[59] Daniel Kudenko,et al. Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[60] Xiaofeng Wang,et al. Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[61] Bernard Manderick,et al. Q-Learning in Simulated Robotic Soccer - Large State Spaces and Incomplete Information , 2002, ICMLA.

[62] Byoung-Tak Zhang,et al. Stock Trading System Using Reinforcement Learning with Cooperative Agents , 2002, ICML.

[63] Jae Won Lee,et al. A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems , 2002, DEXA.

[64] José M. Vidal,et al. Learning in Multiagent Systems: An Introduction from a Game-Theoretic Perspective , 2003, Adaptive Agents and Multi-Agents Systems.

[65] Matthijs T. J. Spaan,et al. High level coordination of agents based on multiagent Markov decision processes with roles , 2002 .

[66] Akira Hayashi,et al. A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[67] Michael P. Wellman,et al. The 2001 trading agent competition , 2002, Electron. Mark..

[68] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[69] Milind Tambe,et al. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[70] Georgios Chalkiadakis. Multiagent reinforcement learning: stochastic games with multiple learning players , 2003 .

[71] Nikos Vlassis,et al. A Concise Introduction to Multiagent Systems and Distributed AI , 2003 .

[72] Yukinori Kakazu,et al. An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning , 2003, Robotics Auton. Syst..

[73] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[74] Bikramjit Banerjee,et al. Adaptive policy gradient in multiagent learning , 2003, AAMAS '03.

[75] Gerald Tesauro,et al. Extending Q-Learning to General Adaptive Multi-Agent Systems , 2003, NIPS.

[76] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[77] Ville Könönen,et al. Gradient Based Method for Symmetric and Asymmetric Multiagent Reinforcement Learning , 2003, IDEAL.

[78] Manuela Veloso,et al. Multiagent learning in the presence of agents with limitations , 2003 .

[79] William T. B. Uther,et al. Adversarial Reinforcement Learning , 2003 .

[80] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[81] R. Paul Wiegand,et al. Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[82] Y. Narahari,et al. Reinforcement learning applications in dynamic pricing of retail markets , 2003, EEE International Conference on E-Commerce, 2003. CEC 2003..

[83] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[84] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[85] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[86] Thomas Miconi. When Evolving Populations is Better than Coevolving Individuals: The Blind Mice Problem , 2003, IJCAI.

[87] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[88] Sridhar Mahadevan,et al. Hierarchical Multiagent Reinforcement Learning , 2004 .

[89] Daniel Kudenko,et al. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[90] Q. Henry Wu,et al. Multi-agent learning for routing control within an Internet environment , 2004, Eng. Appl. Artif. Intell..

[91] Nikos A. Vlassis,et al. Sparse cooperative Q-learning , 2004, ICML.

[92] Yoav Shoham,et al. New Criteria and a New Algorithm for Learning in Multi-Agent Systems , 2004, NIPS.

[93] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[94] Jürgen Schmidhuber,et al. Learning Team Strategies: Soccer Case Studies , 1998, Machine Learning.

[95] Jeffrey O. Kephart,et al. Pricing in Agent Economies Using Multi-Agent Q-Learning , 2002, Autonomous Agents and Multi-Agent Systems.

[96] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[97] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[98] Jeffrey S. Rosenschein,et al. Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[99] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[100] Ville Könönen,et al. Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[101] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[102] Felix A. Fischer,et al. Hierarchical reinforcement learning in communication-mediated multiagent coordination , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[103] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[104] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[105] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[106] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.

[107] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.

[108] Mohamed S. Kamel,et al. Learning Coordination Strategies for Cooperative Multiagent Systems , 1998, Machine Learning.

[109] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[110] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.

[111] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[112] Nikos A. Vlassis,et al. Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[113] Robert Fitch,et al. Structural Abstraction Experiments in Reinforcement Learning , 2005, Australian Conference on Artificial Intelligence.

[114] Karl Tuyls,et al. An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[115] E.H.J. Nijhuis,et al. Cooperative multi-agent reinforcement learning of traffic lights , 2005 .

[116] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[117] Nikos A. Vlassis,et al. Non-communicative multi-robot coordination in dynamic environments , 2005, Robotics Auton. Syst..

[118] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[119] Shin Ishii,et al. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game , 2005, Machine Learning.

[120] Ann Nowé,et al. Evolutionary game theory and multi-agent reinforcement learning , 2005, The Knowledge Engineering Review.

[121] Nikos A. Vlassis,et al. Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.

[122] Bart De Schutter,et al. Multiagent Reinforcement Learning with Adaptive State Focus , 2005, BNAIC.

[123] R. Paul Wiegand,et al. Biasing Coevolutionary Search for Optimal Multiagent Behaviors , 2006, IEEE Transactions on Evolutionary Computation.

[124] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[125] Bart De Schutter,et al. Decentralized Reinforcement Learning Control of a Robotic Manipulator , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[126] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[127] Olivier Buffet,et al. Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[128] Shin Ishii,et al. Multiagent reinforcement learning applied to a chase problem in a continuous world , 2001, Artificial Life and Robotics.

[129] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[130] Colin R. Reeves,et al. Evolutionary computation: a unified approach , 2007, Genetic Programming and Evolvable Machines.

[131] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..

[132] De,et al. Relational Reinforcement Learning , 2022 .

引用

Towards optimising modality allocation for multimodal output generation in incremental dialogue

2012

Uncovering demand flexibility in buildings : a smart grid inter-operation framework for the optimization of energy and comfort

2017

Accelerated Method based on Reinforcement Learning and Case Base Reasoning in Multi agent Systems

2012

A Study of Recommender Systems Using Markov Decision Process

2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS)

2018

CLEAN Learning to Improve Coordination and Scalability in Multiagent Systems

2013

A Policy Synthesis-Based Framework for Robot Rescue Decision-Making in Multi-Robot Exploration of Disaster Sites

2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)

2018

Kooperierende Mobile Roboter

Autom.

2013

Deception in Social Learning: A Multi-Agent Reinforcement Learning Perspective

ArXiv

2021

Swarmand pheromone based reinforcement learning methods for the robot(s) path search problem

2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES)

2016

Stigmergic Independent Reinforcement Learning for Multiagent Collaboration

IEEE Transactions on Neural Networks and Learning Systems

2019

Multi-agent modeling and simulation in the AI age

2021

Traffic Engineering in Software-defined Networks using Reinforcement Learning: A Review

2021

Cooperative Multi-Agent Reinforcement-Learning-Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

IEEE Internet of Things Journal

2021

A Novel Network Selection Approach in 5G Heterogeneous Networks Using Q-Learning

2019 26th International Conference on Telecommunications (ICT)

2019

Research on Bidding Strategy of Thermal Power Companies in Electricity Market Based on Multi-Agent Deep Deterministic Policy Gradient

IEEE Access

2021

D2D power control based on supervised and unsupervised learning

2017 3rd IEEE International Conference on Computer and Communications (ICCC)

2017

Entropy Controlled Non-Stationarity for Improving Performance of Independent Learners in Anonymous MARL Settings

ArXiv

2018

A Comprehensive Survey of Multiagent Reinforcement Learning

Towards optimising modality allocation for multimodal output generation in incremental dialogue

Uncovering demand flexibility in buildings : a smart grid inter-operation framework for the optimization of energy and comfort

A reputation-based framework to support dynamic car-pooling,

Cooperative reinforcement learning for independent learners

Learning to Play: Reinforcement Learning and Games

Accelerated Method based on Reinforcement Learning and Case Base Reasoning in Multi agent Systems

A Study of Recommender Systems Using Markov Decision Process

CLEAN Learning to Improve Coordination and Scalability in Multiagent Systems

A Policy Synthesis-Based Framework for Robot Rescue Decision-Making in Multi-Robot Exploration of Disaster Sites

Kooperierende Mobile Roboter

Deception in Social Learning: A Multi-Agent Reinforcement Learning Perspective

Swarmand pheromone based reinforcement learning methods for the robot(s) path search problem

Stigmergic Independent Reinforcement Learning for Multiagent Collaboration

Multi-agent modeling and simulation in the AI age

Traffic Engineering in Software-defined Networks using Reinforcement Learning: A Review

Cooperative Multi-Agent Reinforcement-Learning-Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks

A Novel Network Selection Approach in 5G Heterogeneous Networks Using Q-Learning

Research on Bidding Strategy of Thermal Power Companies in Electricity Market Based on Multi-Agent Deep Deterministic Policy Gradient

D2D power control based on supervised and unsupervised learning

Entropy Controlled Non-Stationarity for Improving Performance of Independent Learners in Anonymous MARL Settings