论文信息 - A Computationally Efficient Information Estimator for Weighted Data

A Computationally Efficient Information Estimator for Weighted Data

The Shannon information content is a fundamental quantity and it is of great importance to estimate it from observed dataset in the field of statistics, information theory, and machine learning. In this study, an estimator for the information content using a given set of weighted data is proposed. The empirical data distribution varies depending on the weight. The notable features of the proposed estimator are its computational efficiency and its ability to deal with weighted data. The proposed estimator is extended in order to estimate cross entropy, entropy and KL divergence with weighted data. Then, the estimators are applied to classification with one-class samples, and distribution preserving data compression problems.

Hideitsu Hino | Noboru Murata

[1] Qing Wang,et al. Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[2] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[3] Jacob Goldberger,et al. ICA based on a Smooth Estimation of the Differential Entropy , 2008, NIPS.

[4] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.

[5] John W. Fisher,et al. ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[6] Jorma Laaksonen,et al. LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[7] L. Györfi,et al. Nonparametric entropy estimation. An overview , 1997 .

[8] Kurt Hornik,et al. kernlab - An S4 Package for Kernel Methods in R , 2004 .

[9] E. Oja,et al. Independent Component Analysis , 2001 .

[10] Thomas M. Cover,et al. Elements of Information Theory , 2005 .