论文信息 - Online Learning with Automata-based Expert Sequences

Online Learning with Automata-based Expert Sequences

We consider a general framework of online learning with expert advice where regret is defined with respect to sequences of experts accepted by a weighted automaton. Our framework covers several problems previously studied, including competing against k-shifting experts. We give a series of algorithms for this problem, including an automata-based algorithm extending weighted-majority and more efficient algorithms based on the notion of failure transitions. We further present efficient algorithms based on an approximation of the competitor automaton, in particular n-gram models obtained by minimizing the \infty-Renyi divergence, and present an extensive study of the approximation properties of such models. Finally, we also extend our algorithms and results to the framework of sleeping experts.

Mehryar Mohri | Scott Yang | M. Mohri | Scott Yang

[1] Mehryar Mohri,et al. A weight pushing algorithm for large vocabulary speech recognition , 2001, INTERSPEECH.

[2] Amit Daniely,et al. Strongly Adaptive Online Learning , 2015, ICML.

[3] Varun Kanade,et al. Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards , 2009, AISTATS.

[4] A. Rényi. On Measures of Entropy and Information , 1961 .

[5] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[7] Stanley F. Chen,et al. An empirical study of smoothing techniques for language modeling , 1999 .

[8] Mark-Jan Nederhof,et al. Practical Experiments with Regular Approximation of Context-Free Languages , 1999, CL.

[9] Sham M. Kakade,et al. A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[10] Wouter M. Koolen,et al. A Closer Look at Adaptive Regret , 2012, J. Mach. Learn. Res..

[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[12] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13] Nicolò Cesa-Bianchi,et al. Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.

[14] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[15] Vladimir Vovk,et al. Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[16] András György,et al. Shifting Regret, Mirror Descent, and Matrices , 2016, ICML.

[17] Tommi S. Jaakkola,et al. Online Learning of Non-stationary Sequences , 2003, NIPS.

[18] Mehryar Mohri,et al. On the Rademacher Complexity of Weighted Automata , 2015, ALT.

[19] Shahin Shahrampour,et al. Distributed Online Optimization in Dynamic Environments Using Mirror Descent , 2016, IEEE Transactions on Automatic Control.

[20] Omar Besbes,et al. Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[21] Chen-Yu Wei,et al. Tracking the Best Expert in Non-stationary Stochastic Environments , 2017, NIPS.

[22] Wouter M. Koolen,et al. Universal Codes From Switching Strategies , 2013, IEEE Transactions on Information Theory.

[23] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.

[24] Shahin Shahrampour,et al. Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[25] Aryan Mokhtari,et al. Optimization in Dynamic Environments : Improved Regret Rates for Strongly Convex Problems , 2016 .

[26] Mehryar Mohri,et al. On-Line Learning Algorithms for Path Experts with Non-Additive Losses , 2015, COLT.

[27] Mehryar Mohri,et al. Spectral Learning of General Weighted Automata via Constrained Matrix Completion , 2012, NIPS.

[28] Seshadhri Comandur,et al. Efficient learning algorithms for changing environments , 2009, ICML '09.

[29] Rebecca N. Wright,et al. Finite-State Approximation of Phrase Structure Grammars , 1991, ACL.

[30] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine-mediated learning.

[31] Yoram Singer,et al. Using and combining predictors that specialize , 1997, STOC '97.

[32] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[33] Manfred K. Warmuth,et al. Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[34] Brian Roark,et al. Generalized Algorithms for Constructing Statistical Language Models , 2003, ACL.

[35] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[36] Brian Roark,et al. Smoothed marginal distribution constraints for language modeling , 2013, ACL.

[37] Mehryar Mohri,et al. Weighted Automata Algorithms , 2009 .

[38] Mehryar Mohri,et al. String-Matching with Automata , 1997, Nord. J. Comput..

[39] Rebecca Willett,et al. Online Optimization in Dynamic Environments , 2013, ArXiv.

[40] Alfred V. Aho,et al. Efficient string matching , 1975, Commun. ACM.

[41] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[42] Tamás Linder,et al. Efficient Tracking of Large Classes of Experts , 2012, IEEE Trans. Inf. Theory.

[43] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[44] Leonard Pitt,et al. The minimum consistent DFA problem cannot be approximated within any polynomial , 1993, JACM.

[45] Thomas Steinke,et al. Learning Hurdles for Sleeping Experts , 2014, ACM Trans. Comput. Theory.

[46] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .