Playing is believing: The role of beliefs in multi-agent learning

We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms, including the case of interleague play. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the long-run against fair opponents.

[1]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[2]  Peter Stone,et al.  Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[3]  John Nachbar,et al.  Non-computable strategies and discounted repeated games , 1996 .

[4]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[5]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[6]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[7]  Shin Ishii,et al.  Multi-agent reinforcement learning: an approach based on the other agent's internal model , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[8]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[9]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[10]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[11]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[12]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[13]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .