A Nonparametric Clustering Algorithm with a Quantile-Based Likelihood Estimator

Clustering is a representative of unsupervised learning and one of the important approaches in exploratory data analysis. By its very nature, clustering without strong assumption on data distribution is desirable. Information-theoretic clustering is a class of clustering methods that optimize information-theoretic quantities such as entropy and mutual information. These quantities can be estimated in a nonparametric manner, and information-theoretic clustering algorithms are capable of capturing various intrinsic data structures. It is also possible to estimate information-theoretic quantities using a data set with sampling weight for each datum. Assuming the data set is sampled from a certain cluster and assigning different sampling weights depending on the clusters, the cluster-conditional information-theoretic quantities are estimated. In this letter, a simple iterative clustering algorithm is proposed based on a nonparametric estimator of the log likelihood for weighted data sets. The clustering algorithm is also derived from the principle of conditional entropy minimization with maximum entropy regularization. The proposed algorithm does not contain a tuning parameter. The algorithm is experimentally shown to be comparable to or outperform conventional nonparametric clustering methods.

[1]  William Bialek,et al.  Geometric Clustering Using the Information Bottleneck Method , 2003, NIPS.

[2]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Ángel Fernando Kuri Morales,et al.  A Clustering Method Based on the Maximum Entropy Principle , 2015, Entropy.

[5]  Masashi Sugiyama,et al.  Information-maximization clustering: analytic solution and model selection (情報論的学習理論と機械学習) , 2011 .

[6]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[8]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[9]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[10]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[11]  Robert Jenssen,et al.  Information cut for clustering using a gradient descent approach , 2007, Pattern Recognit..

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[14]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[15]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[16]  Jacob Goldberger,et al.  Nonparametric Information Theoretic Clustering Algorithm , 2010, ICML.

[17]  Alfred O. Hero,et al.  Geodesic entropic graphs for dimension and entropy estimation in manifold learning , 2004, IEEE Transactions on Signal Processing.

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[20]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[21]  Xiaogang Wang,et al.  Clues: an R Package for Nonparametric Clustering Based on Local Shrinking , 2022 .

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[24]  Robert Jenssen,et al.  Kernel Entropy Component Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  José Carlos Príncipe,et al.  Information theoretic clustering , 2002, Scholarpedia.

[26]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2006 .

[27]  Meihong Wang,et al.  Information Theoretical Clustering via Semidefinite Programming , 2011, AISTATS.

[28]  Argyris Kalogeratos,et al.  Dip-means: an incremental clustering method for estimating the number of clusters , 2012, NIPS.

[29]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Robert Jenssen,et al.  A new information theoretic clustering algorithm using k-nn , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[31]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[32]  L. Hubert,et al.  Comparing partitions , 1985 .

[33]  Hideitsu Hino,et al.  A Grouped Ranking Model for Item Preference Parameter , 2010, Neural Computation.

[34]  Hideitsu Hino,et al.  Information estimators for weighted observations , 2013, Neural Networks.

[35]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[36]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[37]  Jacob Goldberger,et al.  ICA based on a Smooth Estimation of the Differential Entropy , 2008, NIPS.

[38]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[39]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[40]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[41]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[43]  Rong Jin,et al.  Non-parametric Mixture Models for Clustering , 2010, SSPR/SPR.

[44]  A Computationally Efficient Information Estimator for Weighted Data , 2011, ICANN.

[45]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[46]  Samuel Kaski,et al.  Fast Semi-Supervised Discriminative Component Analysis , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.