论文信息 - On Local Rewards and Scaling Distributed Reinforcement Learning

On Local Rewards and Scaling Distributed Reinforcement Learning

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worst-case lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

Andrew Y. Ng | J. Andrew Bagnell

[1] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.

[2] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[3] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[4] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.

[5] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[6] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.

[7] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.

[8] Leslie Pack Kaelbling,et al. All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[9] Noga Alon,et al. The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[10] Michael Kearns,et al. Efficient Reinforcement Learning in Factored MDPs , 1999, IJCAI.

[11] Michael L. Littman,et al. Graphical Models for Game Theory , 2001, UAI.