Expectation Maximization for Clustering on Hyperspheres

High dimensional directional data is becoming increasingly important in contemporary applications such as analysis of text and gene-expression data. A natural model for multi-variate directional data is provided by the von Mises-Fisher (vMF) distribution on the unit hypersphere that is analogous to multi-variate Gaussian distribution in R d . In this paper, we propose modeling complex directional data as a mixture of vMF distributions. We derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the parameters of this mixture. We also propose two clustering algorithms corresponding to these variants. An interesting aspect of our methodology is that the spherical kmeans algorithm (kmeans with cosine similarity) can be shown to be a special case of both our algorithms. Thus, modeling text data by vMF distributions lends theoretical validity to the use of cosine similarity which has been widely used by the information retrieval community. We provide several results on modeling high-dimensional text and gene data as experimental validation. The results indicate that our approach yields superior clusterings especially for di‐cult clustering tasks in high-dimensional space.

[1]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[2]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[3]  C. R. Rao,et al.  Linear Statistical Inference and its Applications , 1968 .

[4]  N. Fisher,et al.  Efficient Simulation of the von Mises Distribution , 1979 .

[5]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[6]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Editors , 1986, Brain Research Bulletin.

[9]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[12]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[13]  I. Dhillon,et al.  Modeling Data using Directional Distributions , 2003 .

[14]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[15]  Joydeep Ghosh,et al.  Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[16]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[17]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[18]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[19]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[20]  P. Brown,et al.  Yeast microarrays for genome wide parallel genetic and gene expression analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Geoffrey W. Hill,et al.  Evaluation and Inversion of the Ratios of Modified Bessel Functions, I1(x) /I0 (x) and I 1.5(x)/ I0.5(x) , 1981, TOMS.

[22]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[23]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[24]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[25]  David G. Stork,et al.  Pattern Classification , 1973 .

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[28]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[29]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[30]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[31]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[32]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[33]  A. Wood Simulation of the von mises fisher distribution , 1994 .

[34]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[35]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[36]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[37]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[38]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[39]  G. A. Watson A treatise on the theory of Bessel functions , 1944 .