Convergence properties of functional estimates for discrete distributions

Suppose P is an arbitrary discrete distribution on acountable alphabet 𝒳. Given an i.i.d. sample (X1,…,Xn) drawnfrom P, we consider the problem of estimating the entropy H(P) or some other functional F=F(P) of the unknown distribution P. We show that, for additive functionals satisfying mild conditions (including the cases of the mean, the entropy, and mutual information), the plug‐in estimates of F are universally consistent. We also prove that, without further assumptions, no rate‐of‐convergence results can be obtained for any sequence of estimators. In the case of entropy estimation, under a variety of different assumptions, we get rate‐of‐convergence results for the plug‐in estimate and for a nonparametric estimator based on match‐lengths. The behavior of the variance and the expected error of the plug‐in estimate is shown to be in sharp contrast to the finite‐alphabet case. A number of other important examples of functionals are also treated in some detail. © 2001 John Wiley & Sons, Inc. Random Struct. Alg., 19: 163–193, 2001

[1]  G. Basharin On a Statistical Estimate for the Entropy of a Sequence of Independent Random Variables , 1959 .

[2]  B. V. Bahr,et al.  Inequalities for the $r$th Absolute Moment of a Sum of Random Variables, $1 \leqq r \leqq 2$ , 1965 .

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[5]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  L. Devroye On arbitrarily slow rates of global convergence in density estimation , 1983 .

[7]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[8]  L. Birge,et al.  On estimating a density using Hellinger distance and some other strange facts , 1986 .

[9]  J. Steele An Efron-Stein inequality for nonsymmetric statistics , 1986 .

[10]  L. Devroye A Course in Density Estimation , 1987 .

[11]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[12]  Aaron D. Wyner,et al.  Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression , 1989, IEEE Trans. Inf. Theory.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  L. Devroye Exponential Inequalities in Nonparametric Estimation , 1991 .

[15]  L. Devroye Another proof of a slow convergence result of Birgé , 1995 .

[16]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[17]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[18]  Sergio Verdú,et al.  Fifty Years of Shannon Theory , 1998, IEEE Trans. Inf. Theory.

[19]  Yuri M. Suhov,et al.  Nonparametric Entropy Estimation for Stationary Processesand Random Fields, with Applications to English Text , 1998, IEEE Trans. Inf. Theory.

[20]  Ioannis Kontoyiannis,et al.  Asymptotic Recurrence and Waiting Times for Stationary Processes , 1998 .

[21]  A. Antos Performance limits of nonparametric estimators , 1999 .

[22]  László Györfi,et al.  Lower Bounds for Bayes Error Estimation , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[24]  V. Statulevičius,et al.  Limit Theorems of Probability Theory , 2000 .

[25]  Yunwei Jia,et al.  Universal lossless coding of sources with large and unbounded alphabets , 2000, 2000 IEEE International Symposium on Information Theory (Cat. No.00CH37060).

[26]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[27]  W. Szpankowski Average Case Analysis of Algorithms on Sequences , 2001 .

[28]  Wojciech Szpankowski,et al.  Average Case Analysis of Algorithms on Sequences: Szpankowski/Average , 2001 .

[29]  On nonparametric estimates of the expectation , 2002 .