Online Learning with Automata-based Expert Sequences

We consider a general framework of online learning with expert advice where regret is defined with respect to sequences of experts accepted by a weighted automaton. Our framework covers several problems previously studied, including competing against k-shifting experts. We give a series of algorithms for this problem, including an automata-based algorithm extending weighted-majority and more efficient algorithms based on the notion of failure transitions. We further present efficient algorithms based on an approximation of the competitor automaton, in particular n-gram models obtained by minimizing the \infty-Renyi divergence, and present an extensive study of the approximation properties of such models. Finally, we also extend our algorithms and results to the framework of sleeping experts.

[1]  Mehryar Mohri,et al.  A weight pushing algorithm for large vocabulary speech recognition , 2001, INTERSPEECH.

[2]  Amit Daniely,et al.  Strongly Adaptive Online Learning , 2015, ICML.

[3]  Varun Kanade,et al.  Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards , 2009, AISTATS.

[4]  A. Rényi On Measures of Entropy and Information , 1961 .

[5]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[7]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[8]  Mark-Jan Nederhof,et al.  Practical Experiments with Regular Approximation of Context-Free Languages , 1999, CL.

[9]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[10]  Wouter M. Koolen,et al.  A Closer Look at Adaptive Regret , 2012, J. Mach. Learn. Res..

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Nicolò Cesa-Bianchi,et al.  Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.

[14]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[15]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[16]  András György,et al.  Shifting Regret, Mirror Descent, and Matrices , 2016, ICML.

[17]  Tommi S. Jaakkola,et al.  Online Learning of Non-stationary Sequences , 2003, NIPS.

[18]  Mehryar Mohri,et al.  On the Rademacher Complexity of Weighted Automata , 2015, ALT.

[19]  Shahin Shahrampour,et al.  Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[20]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[21]  Chen-Yu Wei,et al.  Tracking the Best Expert in Non-stationary Stochastic Environments , 2017, NIPS.

[22]  Wouter M. Koolen,et al.  Universal Codes From Switching Strategies , 2013, IEEE Transactions on Information Theory.

[23]  Omar Besbes,et al.  Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[24]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[25]  Aryan Mokhtari,et al.  Optimization in Dynamic Environments : Improved Regret Rates for Strongly Convex Problems , 2016 .

[26]  Mehryar Mohri,et al.  On-Line Learning Algorithms for Path Experts with Non-Additive Losses , 2015, COLT.

[27]  Mehryar Mohri,et al.  Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[28]  Seshadhri Comandur,et al.  Efficient learning algorithms for changing environments , 2009, ICML '09.

[29]  Rebecca N. Wright,et al.  Finite-State Approximation of Phrase Structure Grammars , 1991, ACL.

[30]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[31]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[32]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[33]  Manfred K. Warmuth,et al.  Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[34]  Brian Roark,et al.  Generalized Algorithms for Constructing Statistical Language Models , 2003, ACL.

[35]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[36]  Brian Roark,et al.  Smoothed marginal distribution constraints for language modeling , 2013, ACL.

[37]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[38]  Mehryar Mohri,et al.  String-Matching with Automata , 1997, Nord. J. Comput..

[39]  Rebecca Willett,et al.  Online Optimization in Dynamic Environments , 2013, ArXiv.

[40]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[41]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[42]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2012, IEEE Trans. Inf. Theory.

[43]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[44]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within any polynomial , 1993, JACM.

[45]  Thomas Steinke,et al.  Learning Hurdles for Sleeping Experts , 2014, ACM Trans. Comput. Theory.

[46]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .