论文信息 - Online Learning with Sample Path Constraints

Online Learning with Sample Path Constraints

We study online learning where a decision maker interacts with Nature with the objective of maximizing her long-term average reward subject to some sample path average constraints. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature's choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint, the convex hull turns out to be the highest attainable function. Using a calibrated forecasting rule, we provide an explicit strategy that attains this convex hull. We also measure the performance of heuristic methods based on non-calibrated forecasters in experiments involving a CPU power management problem.

John N. Tsitsiklis | Shie Mannor | Jia Yuan Yu | Shie Mannor | J. Tsitsiklis

[1] S. Vajda,et al. Contribution to the Theory of Games , 1951 .

[2] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[3] Nahum Shimkin,et al. Stochastic Games with Average Cost Constraints , 1994 .

[4] Dean P. Foster,et al. Calibrated Learning and Correlated Equilibrium , 1997 .

[5] John N. Tsitsiklis,et al. Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[6] E. Altman. Constrained Markov Decision Processes , 1999 .

[7] Xavier Spinat,et al. A Necessary and Sufficient Condition for Approachability , 2002, Math. Oper. Res..

[8] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[9] Shie Mannor,et al. The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[10] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[12] Shie Mannor,et al. Online calibrated forecasts: Memory efficiency versus universality for learning in games , 2006, Machine Learning.

[13] Nimrod Megiddo,et al. Online Learning with Prior Knowledge , 2007, COLT.

[14] Shie Mannor,et al. Online Learning with Expert Advice and Finite-Horizon Constraints , 2008, AAAI.

[15] D. Blackwell. Controlled Random Walks , 2010 .