Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time relative to the number of experts in each round. Despite the unbounded nature of the log-loss, we derive a bound that is independent of the largest loss and of the largest gradient, and depends only on the number of experts and the time horizon. Furthermore we give a Bayesian interpretation of Prod and adapt the algorithm to derive a tracking regret.

[1]  E. R. Love,et al.  64.4 Some Logarithm Inequalities , 1980 .

[2]  Santosh S. Vempala,et al.  Efficient algorithms for universal portfolios , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  T. Cover Universal Portfolios , 1996 .

[4]  Christopher Mattern Statistical Data Compression , 2008, Encyclopedia of Algorithms.

[5]  Yoram Singer,et al.  On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[6]  Wouter M. Koolen Prediction with Expert Advice , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  Vladimir Vovk,et al.  Prediction with Expert Evaluators' Advice , 2009, ALT.

[8]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[9]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[10]  Gilles Stoltz,et al.  A second-order bound with excess losses , 2014, COLT.

[11]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[12]  Joel Veness,et al.  On Ensemble Techniques for AIXI Approximation , 2012, AGI.

[13]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[14]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[15]  Marcus Hutter,et al.  Sparse Adaptive Dirichlet-Multinomial-like Processes , 2013, COLT.

[16]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[17]  Alessandro Lazaric,et al.  Exploiting easy data in online optimization , 2014, NIPS.

[18]  Wouter M. Koolen,et al.  Putting Bayes to sleep , 2012, NIPS.

[19]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[20]  Wilfred Perks,et al.  Some observations on inverse probability including a new indifference rule , 1947 .

[21]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[22]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[23]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.