Opportunistic Strategies for Generalized No-Regret Problems

This paper considers a generalized no-regret problem with vector-valued rewards, defined in terms of a desired reward set of the agent. For each mixed action q of the opponent, the agent has a set R � (q) where the average reward should reside. In addition, the agent has a response mixed action p which brings the expected reward under these two actions, r(p;q), to R � (q). If a strategy of the agent ensures that the average reward converges to R � (¯qn), where ¯ qn is the empirical distribution of the opponent’s actions, for any strategy of the opponent, we say that it is a no-regret strategy with respect to R � (q). When the multifunction q 㜡 R � (q) is convex, as is the case in the standard no-regret problem, noregret strategies can be devised. Our main interest in this paper is in cases where this convexity property does not hold. The best that can be guaranteed in general then is the convergence of the average reward to R c (¯qn), the convex hull of R � (¯qn). However, as the game unfolds, it may turn out that the opponent’s choices of actions are limited in some way. If these restrictions were known in advance, the agent could possibly ensure convergence of the average reward to some desired subset of R c (¯qn), or even approach R � (¯qn) itself. We formulate appropriate goals for opportunistic no-regret strategies, in the sense that they may exploit such limitations on the opponent’s action sequence in an on-line manner, without knowing them beforehand. As the main technical tool, we propose a class of approachability algorithms that rely on a calibrated forecast of the opponent’s actions, which are opportunistic in the above mentioned sense. As an application, we consider the online no-regret problem with average cost constraints, introduced in Mannor, Tsitsiklis, and Yu (2009). We show, in particular, that our algorithm does attain the best-responsein-hindsight for this problem if the opponent’s play happens to be stationary, or close to stationary in a certain sense.

[1]  Shie Mannor,et al.  The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[2]  Xavier Spinat,et al.  A Necessary and Sufficient Condition for Approachability , 2002, Math. Oper. Res..

[3]  Peter L. Bartlett,et al.  Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[4]  A. Shwartz,et al.  Guaranteed performance regions in Markovian systems with competing decision makers , 1993, IEEE Trans. Autom. Control..

[5]  Shie Mannor,et al.  Online calibrated forecasts: Memory efficiency versus universality for learning in games , 2006, Machine Learning.

[6]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[7]  Vianney Perchet,et al.  Calibration and Internal no-Regret with Partial Monitoring , 2010, ArXiv.

[8]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  Emanuel Milman Approachable sets of vector payoffs in stochastic games , 2006, Games Econ. Behav..

[10]  D. Blackwell Controlled Random Walks , 2010 .

[11]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[12]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[13]  Shie Mannor,et al.  Online Classification with Specificity Constraints , 2010, NIPS.

[14]  Shie Mannor,et al.  Regret minimization in repeated matrix games with variable stage duration , 2008, Games Econ. Behav..

[15]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[16]  A. Dawid Comment: The Impossibility of Inductive Inference , 1985 .

[17]  John N. Tsitsiklis,et al.  Online Learning with Sample Path Constraints , 2009, J. Mach. Learn. Res..

[18]  Ehud Lehrer,et al.  Approachability in infinite dimensional spaces , 2003, Int. J. Game Theory.

[19]  Shie Mannor,et al.  Online Learning for Global Cost Functions , 2009, COLT.

[20]  Sham M. Kakade,et al.  (weak) Calibration is Computationally Hard , 2012, COLT.

[21]  Ambuj Tewari,et al.  Complexity-Based Approach to Calibration with Checking Rules , 2011, COLT.

[22]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .