Nonparametric e-Mixture Estimation

This study considers the common situation in data analysis when there are few observations of the distribution of interest or the target distribution, while abundant observations are available from auxiliary distributions. In this situation, it is natural to compensate for the lack of data from the target distribution by using data sets from these auxiliary distributions—in other words, approximating the target distribution in a subspace spanned by a set of auxiliary distributions. Mixture modeling is one of the simplest ways to integrate information from the target and auxiliary distributions in order to express the target distribution as accurately as possible. There are two typical mixtures in the context of information geometry: the - and -mixtures. The -mixture is applied in a variety of research fields because of the presence of the well-known expectation-maximazation algorithm for parameter estimation, whereas the -mixture is rarely used because of its difficulty of estimation, particularly for nonparametric models. The -mixture, however, is a well-tempered distribution that satisfies the principle of maximum entropy. To model a target distribution with scarce observations accurately, this letter proposes a novel framework for a nonparametric modeling of the -mixture and a geometrically inspired estimation algorithm. As numerical examples of the proposed framework, a transfer learning setup is considered. The experimental results show that this framework works well for three types of synthetic data sets, as well as an EEG real-world data set.

[1]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[2]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[3]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[4]  Yuan Ji,et al.  Applications of beta-mixture models in bioinformatics , 2005, Bioinform..

[5]  Klaus-Robert Müller,et al.  The non-invasive Berlin Brain–Computer Interface: Fast acquisition of effective performance in untrained subjects , 2007, NeuroImage.

[6]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[10]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Hideitsu Hino,et al.  Information estimators for weighted observations , 2013, Neural Networks.

[12]  Russell Greiner,et al.  Consistency and Generalization Bounds for Maximum Entropy Density Estimation , 2013, Entropy.

[13]  Bregman divergence and density integration , 2009 .

[14]  Yoonsuck Choe,et al.  Parameter Learning for Alpha Integration , 2013, Neural Computation.

[15]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[16]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[18]  Cuntai Guan,et al.  Learning from other subjects helps reducing Brain-Computer Interface calibration time , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[20]  S. Akaho The e-PCA and m-PCA: dimension reduction of parameters by information geometry , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[21]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[22]  Tom Heskes,et al.  Selecting Weighting Factors in Logarithmic Opinion Pools , 1997, NIPS.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[25]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[26]  Michael S. Lazar,et al.  Spatial patterns underlying population differences in the background EEG , 2005, Brain Topography.

[27]  Shun-ichi Amari,et al.  Information Geometry and Its Applications , 2016 .

[28]  Shiliang Sun,et al.  A subject transfer framework for EEG classification , 2012, Neurocomputing.