论文信息 - Approximation Algorithms for Tensor Clustering

Approximation Algorithms for Tensor Clustering

We present the first (to our knowledge) approximation algorithm for tensor clustering--a powerful generalization to basic 1D clustering. Tensors are increasingly common in modern applications dealing with complex heterogeneous data and clustering them is a fundamental tool for data analysis and pattern discovery. Akin to their 1D cousins, common tensor clustering formulations are NP-hard to optimize. But, unlike the 1D case, no approximation algorithms seem to be known. We address this imbalance and build on recent co-clustering work to derive a tensor clustering algorithm with approximation guarantees, allowing metrics and divergences (e.g., Bregman) as objective functions. Therewith, we answer two open questions by Anagnostopoulos et al. (2008). Our analysis yields a constant approximation factor independent of data size; a worst-case example shows this factor to be tight for Euclidean co-clustering. However, empirically the approximation factor is observed to be conservative, so our method can also be used in practice.

Suvrit Sra | Arindam Banerjee | Stefanie Jegelka

[1] Richard Nock,et al. Mixed Bregman Clustering with Approximation Guarantees , 2008, ECML/PKDD.

[2] Johannes Blömer,et al. Coresets and approximate clustering for Bregman divergences , 2009, SODA.

[3] Venu Madhav Govindu,et al. A tensor decomposition for geometric grouping and segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4] Suvrit Sra,et al. Minimum Sum-Squared Residue based clustering of Gene Expression Data , 2004 .

[5] Andrzej Stachurski,et al. Parallel Optimization: Theory, Algorithms and Applications , 2000, Scalable Comput. Pract. Exp..

[6] Shai Ben-David,et al. A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.

[7] Chris Ding,et al. Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors , 2009, KDD 2009.

[8] Peter A. Flach,et al. Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[9] Pietro Perona,et al. Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10] Vin de Silva,et al. Tensor rank and the ill-posedness of the best low-rank approximation problem , 2006, math/0607647.

[11] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..

[12] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[13] Inderjit S. Dhillon,et al. Information-theoretic co-clustering , 2003, KDD '03.

[14] Anirban Dasgupta,et al. Approximation algorithms for co-clustering , 2008, PODS.

[15] Marcel R. Ackermann,et al. Clustering for metric and non-metric distance measures , 2008, SODA '08.

[16] J. Hartigan. Direct Clustering of a Data Matrix , 1972 .

[17] Joseph T. Chang,et al. Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[18] Arindam Banerjee,et al. Approximation Algorithms for Bregman Clustering Co-clustering and Tensor Clustering , 2008 .

[19] Andrew McGregor,et al. Finding Metric Structure in Information Theoretic Clustering , 2008, COLT.

[20] Gemma C. Garriga,et al. An approximation ratio for biclustering , 2008, Inf. Process. Lett..

[21] Suvrit Sra,et al. Approximation Algorithms for Bregman Co-clustering and Tensor Clustering , 2008, ArXiv.