Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1

[1]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[2]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[3]  C. Morris Natural Exponential Families with Quadratic Variance Functions , 1982 .

[4]  H. Rademacher On the Approximation of Irrational Numbers by Rational Numbers , 1983 .

[5]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[6]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[7]  Ananda Sen,et al.  The Theory of Dispersion Models , 1997, Technometrics.

[8]  Manfred K. Warmuth,et al.  The Minimax Strategy for Gaussian Density Estimation. pp , 2000, COLT.

[9]  Sergio VerdÂ,et al.  The Minimum Description Length Principle in Coding and Modeling , 2000 .

[10]  Manfred K. Warmuth,et al.  The Last-Step Minimax Algorithm , 2000, ALT.

[11]  E. Takimoto,et al.  The Minimax Strategy for Gaussian Density Estimation , 2000 .

[12]  Imre Csiszár,et al.  Information projections revisited , 2000, IEEE Trans. Inf. Theory.

[13]  Feng Liang,et al.  Exact minimax strategies for predictive density estimation, data compression, and model selection , 2002, IEEE Transactions on Information Theory.

[14]  Sham M. Kakade,et al.  Worst-Case Bounds for Gaussian Process Models , 2005, NIPS.

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  J. Rissanen,et al.  Conditional NML Universal Models , 2007, 2007 Information Theory and Applications Workshop.

[17]  J. Rissanen,et al.  ON SEQUENTIALLY NORMALIZED MAXIMUM LIKELIHOOD MODELS , 2008 .

[18]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[19]  Wojciech Kotlowski,et al.  Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation , 2011, COLT.

[20]  P. Bartlett,et al.  The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators , 2012, COLT.

[21]  O. Barndorff-Nielsen Information and Exponential Families in Statistical Theory , 1980 .