MDL convergence speed for Bernoulli sequences

The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.

[1]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[2]  Marcus Hutter Optimality of universal Bayesian prediction for general loss and alphabet , 2003 .

[3]  Marcus Hutter,et al.  On the Convergence Speed of MDL Predictions for Bernoulli Sequences , 2004, ALT.

[4]  Ming Li,et al.  Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.

[5]  Marcus Hutter,et al.  Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet , 2003, J. Mach. Learn. Res..

[6]  Andrew R. Barron,et al.  Information-theoretic asymptotics of Bayes methods , 1990, IEEE Trans. Inf. Theory.

[7]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[8]  Péter Gács,et al.  On the relation between descriptional complexity and algorithmic probability , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[9]  Marcus Hutter Sequential Predictions based on Algorithmic Complexity , 2006, J. Comput. Syst. Sci..

[10]  Jorma Rissanen,et al.  Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..

[11]  A. Barron,et al.  Estimation of mixture models , 1999 .

[12]  Vladimir Vovk,et al.  Learning about the Parameter of the Bernoulli Model , 1997, J. Comput. Syst. Sci..

[13]  Marcus Hutter,et al.  Convergence of Discrete MDL for Sequential Prediction , 2004, COLT.

[14]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[15]  John Langford,et al.  Suboptimal behavior of Bayes and MDL in classification under misspecification , 2004, Machine Learning.

[16]  Marcus Hutter Convergence and Loss Bounds for Bayesian Sequence Prediction , 2003, IEEE Trans. Inf. Theory.

[17]  L. Levin,et al.  THE COMPLEXITY OF FINITE OBJECTS AND THE DEVELOPMENT OF THE CONCEPTS OF INFORMATION AND RANDOMNESS BY MEANS OF THE THEORY OF ALGORITHMS , 1970 .

[18]  Marcus Hutter Sequence Prediction Based on Monotone Complexity , 2003, COLT.

[19]  Marcus Hutter Convergence and Error Bounds for Universal Prediction of Nonbinary Sequences , 2001, ECML.

[20]  Tong Zhang,et al.  On the Convergence of MDL Density Estimation , 2004, COLT.

[21]  Marcus Hutter,et al.  Strong Asymptotic Assertions for Discrete MDL in Regression and Classification , 2005, ArXiv.

[22]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[23]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.