论文信息 - Fast Concurrent Reinforcement Learners

Fast Concurrent Reinforcement Learners

When several agents learn concurrently, the payoff received by an agent is dependent on the behavior of the other agents. As the other agents learn, the reward of one agent becomes non-stationary. This makes learning in multiagent systemsmore difficult than single-agent learning. A few methods, how-ever, are known to guarantee convergence to equilibrium in the limit in such systems. In this paper we experimentally study one such technique, the minimax-Q, in a competitive domain and prove its equivalence with another well-known method for competitive domains. We study the rate of convergence of minimax-Q and investigate possible ways for increasing the same. We also present a variant of the algorithm, minimax-SARSA, and prove its convergence to minimax-Q values under appropriate conditions. Finally we show that this new algorithm performs better than simple minimax-Q in a general-sum domain as well.

[1] W. Browder,et al. Annals of Mathematics , 1889 .

[2] J. Nash. NON-COOPERATIVE GAMES , 1951, Classics in Game Theory.

[3] O. Mangasarian,et al. Two-person nonzero-sum games and quadratic programming , 1964 .

[4] Paul R. Thie,et al. An Introduction to Linear Programming and Game Theory: Thie/An Introduction , 2008 .

[5] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[6] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .

[8] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[9] Tuomas Sandholm,et al. On Multiagent Q-Learning in a Semi-Competitive Domain , 1995, Adaption and Learning in Multi-Agent Systems.

[10] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[11] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[12] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[14] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.