Information-theoretic co-clustering

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory---the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

[1]  R. Fisher 019: On the Interpretation of x2 from Contingency Tables, and the Calculation of P. , 1922 .

[2]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[6]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[7]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[8]  Thomas Hofmann,et al.  Mixture models for co-occurrence and histogram data , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[9]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[10]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[11]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[12]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[13]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[14]  Naftali Tishby,et al.  Agglomerative Multivariate Information Bottleneck , 2001, NIPS.

[15]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[16]  Ran El-Yaniv,et al.  Iterative Double Clustering for Unsupervised and Semi-supervised Learning , 2001, ECML.

[17]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[18]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[19]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.