Approachability in unknown games: Online learning meets multi-objective optimization

In the standard setting of approachability there are two players and a target set. The players play repeatedly a known vector-valued game where the first player wants to have the average vector-valued payoff converge to the target set which the other player tries to exclude it from this set. We revisit this setting in the spirit of online learning and do not assume that the first player knows the game structure: she receives an arbitrary vector-valued reward vector at every round. She wishes to approach the smallest ("best") possible set given the observed average payoffs in hindsight. This extension of the standard setting has implications even when the original target set is not approachable and when it is not obvious which expansion of it should be approached instead. We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals. We further propose a concrete strategy to approach these goals. Our method does not require projection onto a target set and amounts to switching between scalar regret minimization algorithms that are performed in episodes. Applications to global cost minimization and to approachability under sample path constraints are considered.

[1]  Xavier Spinat,et al.  A Necessary and Sufficient Condition for Approachability , 2002, Math. Oper. Res..

[2]  T. Hou Approachability in a Two-person Game , 1971 .

[3]  John N. Tsitsiklis,et al.  Approachability in repeated games: Computational aspects and a Stackelberg variant , 2008, Games Econ. Behav..

[4]  John N. Tsitsiklis,et al.  Online Learning with Sample Path Constraints , 2009, J. Mach. Learn. Res..

[5]  Shie Mannor,et al.  Online Learning for Global Cost Functions , 2009, COLT.

[6]  Wouter M. Koolen,et al.  Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[7]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[8]  Peter L. Bartlett,et al.  Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[9]  Andrey Bernstein,et al.  Response-based approachability with applications to generalized no-regret problems , 2015, J. Mach. Learn. Res..

[10]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[11]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[12]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[13]  J. Shawe-Taylor Potential-Based Algorithms in On-Line Prediction and Game Theory ∗ , 2001 .

[14]  K. J. Ray Liu,et al.  Advances in cognitive radio networks: A survey , 2011, IEEE Journal of Selected Topics in Signal Processing.

[15]  Leon Hirsch,et al.  Fundamentals Of Convex Analysis , 2016 .

[16]  Andrey Bernstein,et al.  Response-Based Approachability and its Application to Generalized No-Regret Algorithms , 2013, ArXiv.

[17]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[18]  Moshe Tennenholtz,et al.  Sequential decision making with vector outcomes , 2014, ITCS.

[19]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[20]  Shie Mannor,et al.  Opportunistic Strategies for Generalized No-Regret Problems , 2013, COLT.

[21]  C. Hwang Multiple Objective Decision Making - Methods and Applications: A State-of-the-Art Survey , 1979 .

[22]  Shie Mannor,et al.  A Geometric Proof of Calibration , 2009, Math. Oper. Res..

[23]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[24]  Vianney Perchet,et al.  Approachability, Regret and Calibration; implications and equivalences , 2013, ArXiv.

[25]  Ching-Lai Hwang,et al.  Fuzzy Multiple Attribute Decision Making - Methods and Applications , 1992, Lecture Notes in Economics and Mathematical Systems.

[26]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[27]  Shie Mannor,et al.  Set-valued approachability and online learning with partial monitoring , 2014, J. Mach. Learn. Res..