Online Learning for Global Cost Functions

We consider an online learning setting where at each time step the decision maker has to choose how to distribute the future loss between k alternatives, and then observes the loss of each alternative. Motivated by load balancing and job scheduling, we consider a global cost function (over the losses incurred by each alternative), rather than a summation of the instantaneous losses as done traditionally in online learning. Such global cost functions include the makespan (the maximum over the alternatives) and the Ld norm (over the alternatives). Based on approachability theory, we design an algorithm that guarantees vanishing regret for this setting, where the regret is measured with respect to the best static decision that selects the same distribution over alternatives at every time step. For the special case of makespan cost we devise a simple and efficient algorithm. In contrast, we show that for concave global cost functions, such as Ld norms for d < 1, the worst-case average regret does not vanish.

[1]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[2]  O. Kallenberg,et al.  Some dimension-free features of vector-valued martingales , 1991 .

[3]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[4]  Avrim Blum,et al.  On-line Learning and the Metrical Task System Problem , 1997, COLT '97.

[5]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[6]  Adam Tauman Kalai,et al.  Finely-competitive paging , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  Adam Tauman Kalai,et al.  Static Optimality and Dynamic Search-Optimality in Lists and Trees , 2002, SODA '02.

[8]  Thomas P. Hayes A large-deviation inequality for vector-valued martingales , 2003 .

[9]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[10]  Baruch Awerbuch,et al.  Adapting to a reliable network path , 2003, PODC '03.

[11]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[12]  Shie Mannor,et al.  The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[13]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[14]  Nimrod Megiddo,et al.  Combining expert advice in reactive environments , 2006, JACM.

[15]  Y. Mansour,et al.  On-line Markov Decision Processes , 2006 .

[16]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[17]  Adam Tauman Kalai,et al.  Playing games with approximation algorithms , 2007, STOC '07.

[18]  Shie Mannor,et al.  Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..

[19]  András György,et al.  On-line Sequential Bin Packing , 2010, COLT.

[20]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[21]  Yishay Mansour,et al.  Online Markov Decision Processes , 2009, Math. Oper. Res..