论文信息 - Online Learning with Constraints

Online Learning with Constraints

We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in genera] the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.

John N. Tsitsiklis | Shie Mannor | Shie Mannor | J. Tsitsiklis

[1] S. Vajda,et al. Contribution to the Theory of Games , 1951 .

[2] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[3] A. Willsky,et al. Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge, Massachusetts 02139 , 1987 .

[4] Nahum Shimkin,et al. Stochastic Games with Average Cost Constraints , 1994 .

[5] T. Başar,et al. Advances in Dynamic Games and Applications , 1994 .

[6] Dean P. Foster,et al. Calibrated Learning and Correlated Equilibrium , 1997 .

[7] E. Altman. Constrained Markov Decision Processes , 1999 .

[8] Microeconomics-Charles W. Upton. Repeated games , 2020, Game Theory.

[9] Xavier Spinat,et al. A Necessary and Sufficient Condition for Approachability , 2002, Math. Oper. Res..

[10] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[11] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[12] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .