Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $\lambda$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of problem parameter (e.g. cocoercive constant $\lambda$) and show, via a novel double-stopping time technique, that this adaptive algorithm achieves same finite-time last-iterate convergence rate as non-adaptive counterpart. Subsequently, we extend OGD learning to the noisy gradient feedback case and establish last-iterate convergence results--first qualitative almost sure convergence, then quantitative finite-time convergence rates-- all under non-decreasing step-sizes. To our knowledge, we provide the first set of results that fill in several gaps of the existing multi-agent online learning literature, where three aspects--finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms have been unexplored before.

[1]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[2]  Luca Sanguinetti,et al.  Distributed Stochastic Optimization via Matrix Exponential Learning , 2016, IEEE Transactions on Signal Processing.

[3]  Stephen P. Boyd,et al.  Stochastic Mirror Descent in Variationally Coherent Optimization Problems , 2017, NIPS.

[4]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[5]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[6]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[7]  C. Tomlin,et al.  Multi-Agent Online Learning with Imperfect Information , 2018 .

[8]  Alexandre M. Bayen,et al.  On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game , 2016, 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS).

[9]  Peter W. Glynn,et al.  Mirror descent learning in continuous games , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[10]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[11]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[12]  Francis Bach,et al.  A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise , 2019, COLT.

[13]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[14]  Alexandre M. Bayen,et al.  Convergence of heterogeneous distributed learning in stochastic routing games , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Yoram Singer,et al.  Convex Repeated Games and Fenchel Duality , 2006, NIPS.

[16]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[17]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[18]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[19]  Peter W. Glynn,et al.  Robust Power Management via Learning and Game Design , 2021, Oper. Res..

[20]  Peter W. Glynn,et al.  Learning in Games with Lossy Feedback , 2018, NeurIPS.

[21]  Georgios Piliouras,et al.  Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[22]  Peter W. Glynn,et al.  Countering Feedback Delays in Multi-Agent Learning , 2017, NIPS.

[23]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[24]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[25]  Andriy Zapechelnyuk,et al.  No-regret dynamics and fictitious play , 2012, J. Econ. Theory.

[26]  Georgios Piliouras,et al.  Limits and limitations of no-regret learning in games , 2017, The Knowledge Engineering Review.

[27]  Zhengyuan Zhou,et al.  Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[28]  Stephen P. Boyd,et al.  On the Convergence of Mirror Descent beyond Stochastic Convex Programming , 2017, SIAM J. Optim..

[29]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[32]  Kent Quanrud,et al.  Online Learning with Adversarial Delays , 2015, NIPS.