Laplace's Rule of Succession in Information Geometry

Laplace's "add-one" rule of succession modifies the observed frequencies in a sequence of heads and tails by adding one to the observed counts. This improves prediction by avoiding zero probabilities and corresponds to a uniform Bayesian prior on the parameter. The canonical Jeffreys prior corresponds to the "add-one-half" rule. We prove that, for exponential families of distributions, such Bayesian predictors can be approximated by taking the average of the maximum likelihood predictor and the \emph{sequential normalized maximum likelihood} predictor from information theory. Thus in this case it is possible to approximate Bayesian predictors without the cost of integrating or sampling in parameter space.