论文信息 - Online Learning with Expert Advice and Finite-Horizon Constraints

Online Learning with Expert Advice and Finite-Horizon Constraints

In this paper, we study a sequential decision making problem. The objective is to maximize the average reward accumulated over time subject to temporal cost constraints. The novelty of our setup is that the rewards and constraints are controlled by an adverse opponent. To solve our problem in a practical way, we propose an expert algorithm that guarantees both a vanishing regret and a sublinear number of violated constraints. The quality of this solution is demonstrated on a real-world power management problem. Our results support the hypothesis that online learning with convex cost constraints can be performed successfully in practice.

[1] Shie Mannor,et al. Adaptive Timeout Policies for Fast Fine-Grained Power Management , 2007, AAAI.

[2] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[3] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[4] Darrell D. E. Long,et al. Adaptive disk spin‐down for mobile computers , 2000, Mob. Networks Appl..

[5] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[6] Anna R. Karlin,et al. Competitive randomized algorithms for nonuniform problems , 1990, SODA '90.

[7] John N. Tsitsiklis,et al. Online Learning with Constraints , 2006, COLT.

[8] Scott A. Brandt,et al. Adaptive Caching by Refetching , 2002, NIPS.

[9] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.

[10] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11] G. Dhiman,et al. Dynamic Power Management Using Machine Learning , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.