论文信息 - Online calibrated forecasts: Memory efficiency versus universality for learning in games

Online calibrated forecasts: Memory efficiency versus universality for learning in games

We provide a simple learning process that enables an agent to forecast a sequence of outcomes. Our forecasting scheme, termed tracking forecast, is based on tracking the past observations while emphasizing recent outcomes. As opposed to other forecasting schemes, we sacrifice universality in favor of a significantly reduced memory requirements. We show that if the sequence of outcomes has certain properties—it has some internal (hidden) state that does not change too rapidly—then the tracking forecast is weakly calibrated so that the forecast appears to be correct most of the time. For binary outcomes, this result holds without any internal state assumptions. We consider learning in a repeated strategic game where each player attempts to compute some forecast of the opponent actions and play a best response to it. We show that if one of the players uses a tracking forecast, while the other player uses a standard learning algorithm (such as exponential regret matching or smooth fictitious play), then the player using the tracking forecast obtains the best response to the actual play of the other players. We further show that if both players use tracking forecast, then under certain conditions on the game matrix, convergence to a Nash equilibrium is possible with positive probability for a larger class of games than the class of games for which smooth fictitious play converges to a Nash equilibrium.

[1] A. W. Tucker,et al. Advances in game theory , 1964 .

[2] S. Vajda. Some topics in two-person games , 1971 .

[3] A. Dawid. Self-Calibrating Priors Do Not Exist: Comment , 1985 .

[4] A. Dawid. Comment: The Impossibility of Inductive Inference , 1985 .

[5] David Oakes,et al. Self-Calibrating Priors Do Not Exist , 1985 .

[6] V. Crawford. Learning behavior and mixed-strategy Nash equilibria , 1985 .

[7] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[8] R. Durrett. Probability: Theory and Examples , 1993 .

[9] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .

[10] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.

[11] E. Chong,et al. EQUIVALENT NECESSARY AND SUFFICIENT CONDITIONS ON NOISE SEQUENCES FOR STOCHASTIC APPROXIMATION ALGORITHMS , 1996 .

[12] Dean P. Foster,et al. Calibrated Learning and Correlated Equilibrium , 1997 .

[13] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[14] Felipe Cucker,et al. COMPLEXITY AND REAL COMPUTATION: A MANIFESTO , 1996 .

[15] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[16] V. Borkar. Stochastic approximation with two time scales , 1997 .

[17] L. Samuelson. Evolutionary Games and Equilibrium Selection , 1997 .

[18] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .

[19] Allan Borodin,et al. Online computation and competitive analysis , 1998 .

[20] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[21] H. Young. Individual Strategy and Social Structure , 2020 .

[22] S. Hart,et al. A general class of adaptative strategies , 1999 .

[23] D. Fudenberg,et al. An Easier Way to Calibrate , 1999 .

[24] S. Hart,et al. A General Class of Adaptive Strategies , 1999 .

[25] M. Benaïm. Dynamics of stochastic approximation algorithms , 1999 .

[26] M. Hirsch,et al. Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .

[27] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[28] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[29] Andreu Mas-Colell,et al. A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[30] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .

[31] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[32] Alvaro Sandroni,et al. Calibration with Many Checking Rules , 2003, Math. Oper. Res..

[33] J. Hofbauer,et al. Uncoupled Dynamics Do Not Lead to Nash Equilibrium , 2003 .

[34] Gürdal Arslan,et al. A feedback stabilization approach to fictitious play , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[35] E. J. Collins,et al. Convergent multiple-timescales reinforcement learning algorithms in normal form games , 2003 .

[36] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[37] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[38] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.

[39] Sham M. Kakade,et al. Deterministic calibration and Nash equilibrium , 2004, J. Comput. Syst. Sci..

[40] H. Peyton Young,et al. Strategic Learning and Its Limits , 2004 .

[41] Gürdal Arslan,et al. Distributed convergence to Nash equilibria with local utility measurements , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[42] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..

[43] Sergiu Hart,et al. Stochastic uncoupled dynamics and nash equilibrium: extended abstract , 2005, TARK.

[44] Jeff S. Shamma,et al. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.

[45] S. Hart. Adaptive Heuristics , 2005 .

[46] Andreu Mas-Colell,et al. Stochastic Uncoupled Dynamics and Nash Equilibrium , 2004, Games Econ. Behav..

[47] Andrew Brennan,et al. Necessary and Sufficient Conditions , 2018, Logic in Wonderland.