Online Dynamic Programming

We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials. An instance of the type of problems we consider is to find a good binary search tree in a changing this http URL the beginning of each trial, the learner probabilistically chooses a tree with the $n$ keys at the internal nodes and the $n+1$ gaps between keys at the leaves. The learner is then told the frequencies of the keys and gaps and is charged by the average search cost for the chosen tree. The problem is online because the frequencies can change between trials. The goal is to develop algorithms with the property that their total average search cost (loss) in all trials is close to the total loss of the best tree chosen in hindsight for all trials. The challenge, of course, is that the algorithm has to deal with exponential number of trees. We develop a general methodology for tackling such problems for a wide class of dynamic programming algorithms. Our framework allows us to extend online learning algorithms like Hedge and Component Hedge to a significantly wider class of combinatorial objects than was possible before.

[1]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[2]  Wouter M. Koolen,et al.  Putting Bayes to sleep , 2012, NIPS.

[3]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[4]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[5]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[6]  Patrick Jaillet,et al.  Solving Combinatorial Games using Products, Projections and Lexicographically Optimal Bases , 2016, ArXiv.

[7]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[8]  J. Loday,et al.  THE MULTIPLE FACETS OF THE ASSOCIAHEDRON , 2005 .

[9]  Gábor Lugosi,et al.  Minimax Policies for Combinatorial Prediction Games , 2011, COLT.

[10]  Ronald L. Rardin,et al.  Polyhedral Characterization of Discrete Dynamic Programming , 1990, Oper. Res..

[11]  Shuji Kijima,et al.  Online Prediction under Submodular Constraints , 2012, ALT.

[12]  Arun Rajkumar,et al.  Online Decision-Making in General Combinatorial Spaces , 2014, NIPS.

[13]  Yishay Mansour,et al.  Online Markov Decision Processes , 2009, Math. Oper. Res..

[14]  Manfred K. Warmuth,et al.  Optimum Follow the Leader Algorithm , 2005, COLT.

[15]  F. Deutsch Dykstra’s Cyclic Projections Algorithm: The Rate of Convergence , 1995 .

[16]  Manfred K. Warmuth,et al.  Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[17]  András György,et al.  Online Learning in Markov Decision Processes with Changing Cost Sequences , 2014, ICML.

[18]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[19]  Masayuki Takeda,et al.  Online Linear Optimization over Permutations , 2011, ISAAC.

[20]  V. Kaibel Extended Formulations in Combinatorial Optimization , 2011, 1104.1023.

[21]  Heinz H. Bauschke,et al.  Legendre functions and the method of random Bregman projections , 1997 .

[22]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[23]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[24]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[25]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[26]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[27]  Nir Ailon Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes , 2014, AISTATS.

[28]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[29]  Magyar Tud The On-Line Shortest Path Problem Under Partial Monitoring , 2007 .

[30]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[32]  Mehryar Mohri,et al.  On-Line Learning Algorithms for Path Experts with Non-Additive Losses , 2015, COLT.

[33]  S. V. N. Vishwanathan,et al.  Online Learning of Combinatorial Objects via Extended Formulation , 2016, ALT.

[34]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[35]  Gábor Lugosi,et al.  Mathematics of operations research , 1998 .