Robust approachability and regret minimization in games with partial monitoring

Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient algorithms (i.e., with constant per-step complexity) for this setup. We finally consider external regret and internal regret in repeated games with partial monitoring and derive regret-minimizing strategies based on approachability theory.

[1]  Vianney Perchet,et al.  Approachability of Convex Sets in Games with Partial Monitoring , 2011, J. Optim. Theory Appl..

[2]  Shie Mannor,et al.  On-Line Learning with Imperfect Monitoring , 2003, COLT.

[3]  A. Rustichini Minimizing Regret : The General Case , 1999 .

[4]  Jörg Rambau,et al.  Projections of polytopes and the generalized baues conjecture , 1996, Discret. Comput. Geom..

[5]  Xiaohong Chen,et al.  Laws of Large Numbers for Hilbert Space-Valued Mixingales with Applications , 1996, Econometric Theory.

[6]  Microeconomics-Charles W. Upton Repeated games , 2020, Game Theory.

[7]  Vianney Perchet,et al.  Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms , 2011, J. Mach. Learn. Res..

[8]  Vianney Perchet,et al.  On an unified framework for approachability in games with or without signals , 2013, ArXiv.

[9]  Myint Swe Khine Learning to Play , 2011 .

[10]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Ambuj Tewari,et al.  Online Learning: Beyond Regret , 2010, COLT.

[13]  Shie Mannor,et al.  Regret minimization in repeated matrix games with variable stage duration , 2008, Games Econ. Behav..

[14]  Vianney Perchet,et al.  Calibration and Internal No-Regret with Random Signals , 2009, ALT.

[15]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[16]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[17]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[18]  John N. Tsitsiklis,et al.  Online Learning with Sample Path Constraints , 2009, J. Mach. Learn. Res..

[19]  Shie Mannor,et al.  A Geometric Proof of Calibration , 2009, Math. Oper. Res..

[20]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[21]  D. Blackwell Controlled Random Walks , 2010 .

[22]  Shie Mannor,et al.  Strategies for Prediction Under Imperfect Monitoring , 2007, Math. Oper. Res..

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  A. Dawid The Well-Calibrated Bayesian , 1982 .

[25]  E. Lehrer,et al.  Learning to play partially-specified equilibrium , 2007 .

[26]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[27]  Peter L. Bartlett,et al.  Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[28]  John E. Laird,et al.  Learning to play , 2009 .

[29]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[30]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[31]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .