Individual Sequence Prediction Using Memory-Efficient Context Trees

Context trees are a popular and effective tool for tasks such as compression, sequential prediction, and language modeling. We present an algebraic perspective of context trees for the task of individual sequence prediction. Our approach stems from a generalization of the notion of margin used for linear predictors. By exporting the concept of margin to context trees, we are able to cast the individual sequence prediction problem as the task of finding a linear separator in a Hilbert space, and to apply techniques from machine learning and online optimization to this problem. Our main contribution is a memory efficient adaptation of the perceptron algorithm for individual sequence prediction. We name our algorithm the shallow perceptron and prove a shifting mistake bound, which relates its performance with the performance of any sequence of context trees. We also prove that the shallow perceptron grows a context tree at a rate that is upper bounded by its mistake rate, which imposes an upper bound on the size of the trees grown by our algorithm.

[1]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[2]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[3]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[4]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[5]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[6]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[7]  Thomas M. Cover,et al.  Behavior of sequential predictors of binary sequences , 1965 .

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Thomas M. Cover,et al.  Compound Bayes Predictors for Sequences with Apparent Markov Structure , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[12]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[13]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[14]  Frans M. J. Willems,et al.  Context Tree Weighting : A Sequential Universal Source Coding Procedure for Fsmx Sources , 1993, Proceedings. IEEE International Symposium on Information Theory.

[15]  F. Willems Extensions to the context tree weighting method , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[16]  Robert E. Schapire,et al.  Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[17]  Frans M. J. Willems,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[18]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[19]  P. Bühlmann,et al.  Variable Length Markov Chains , 1999 .

[20]  Alberto Apostolico,et al.  Optimal Amnesic Probabilistic Automata or How to Learn and Classify Proteins in Linear Time and Space , 2000, J. Comput. Biol..

[21]  Alberto Apostolico,et al.  Optimal amnesic probabilistic automata or how to learn and classify proteins in linear time and space , 2000, RECOMB '00.

[22]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[23]  Yoram Singer,et al.  An Efficient Extension to Mixture Techniques for Prediction and Decision Trees , 1997, COLT '97.

[24]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[25]  Yoram Singer,et al.  Convex Repeated Games and Fenchel Duality , 2006, NIPS.

[26]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[27]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[28]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .