A Computationally Efficient Information Estimator for Weighted Data

The Shannon information content is a fundamental quantity and it is of great importance to estimate it from observed dataset in the field of statistics, information theory, and machine learning. In this study, an estimator for the information content using a given set of weighted data is proposed. The empirical data distribution varies depending on the weight. The notable features of the proposed estimator are its computational efficiency and its ability to deal with weighted data. The proposed estimator is extended in order to estimate cross entropy, entropy and KL divergence with weighted data. Then, the estimators are applied to classification with one-class samples, and distribution preserving data compression problems.