论文信息 - Regret minimization in repeated matrix games with variable stage duration

Regret minimization in repeated matrix games with variable stage duration

Regret minimization in repeated matrix games has been extensively studied ever since Hannan's seminal paper [Hannan, J., 1957. Approximation to Bayes risk in repeated play. In: Dresher, M., Tucker, A.W., Wolfe, P. (Eds.), Contributions to the Theory of Games, vol. III. Ann. of Math. Stud., vol. 39, Princeton Univ. Press, Princeton, NJ, pp. 97-193]. Several classes of no-regret strategies now exist; such strategies secure a long-term average payoff as high as could be obtained by the fixed action that is best, in hindsight, against the observed action sequence of the opponent. We consider an extension of this framework to repeated games with variable stage duration, where the duration of each stage may depend on actions of both players, and the performance measure of interest is the average payoff per unit time. We start by showing that no-regret strategies, in the above sense, do not exist in general. Consequently, we consider two classes of adaptive strategies, one based on Blackwell's approachability theorem and the other on calibrated play, and examine their performance guarantees. We further provide sufficient conditions for existence of no-regret strategies in this model.

Shie Mannor | Nahum Shimkin | Shie Mannor | N. Shimkin

[1] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[2] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[3] D. Fudenberg,et al. Conditional Universal Consistency , 1999 .

[4] Ehud Lehrer,et al. A wide range no-regret theorem , 2003, Games Econ. Behav..

[5] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6] R. Vohra,et al. Calibrated Learning and Correlated Equilibrium , 1996 .

[7] A. Rustichini. Minimizing Regret : The General Case , 1999 .

[8] D. Fudenberg,et al. An Easier Way to Calibrate , 1999 .

[9] A. Shwartz,et al. Guaranteed performance regions in Markovian systems with competing decision makers , 1993, IEEE Trans. Autom. Control..

[10] S. Hart. Adaptive Heuristics , 2005 .

[11] S. Hart,et al. A General Class of Adaptive Strategies , 1999 .

[12] Sham M. Kakade,et al. Deterministic calibration and Nash equilibrium , 2004, J. Comput. Syst. Sci..

[13] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[14] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[15] S. Hart,et al. A general class of adaptative strategies , 1999 .

[16] Yuhong Yang. Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas , 2008 .

[17] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[18] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[19] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20] Alvaro Sandroni,et al. Calibration with Many Checking Rules , 2003, Math. Oper. Res..

[21] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.

[22] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[23] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[24] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[25] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[26] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[27] D. Blackwell. Controlled Random Walks , 2010 .

[28] Dean P. Foster,et al. A Proof of Calibration Via Blackwell's Approachability Theorem , 1999 .

[29] E. Kalai,et al. Calibrated Forecasting and Merging , 1999 .

[30] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[31] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[32] Shie Mannor,et al. Online calibrated forecasts: Memory efficiency versus universality for learning in games , 2006, Machine Learning.

[33] Sagnik Sinha,et al. Zero-sum two-person semi-Markov games , 1992, Journal of Applied Probability.