Sharp performance bounds for graph clustering via convex optimization

The problem of finding clusters in a graph arises in several applications such as social networks, data mining and computer networks. A typical, convex optimization-approach, that is often adopted is to identify a sparse plus low-rank decomposition of the adjacency matrix of the graph, with the (dense) low-rank component representing the clusters. In this paper, we sharply characterize the conditions for successfully identifying clusters using this approach. In particular, we introduce the “effective density” of a cluster that measures its significance and we find explicit upper and lower bounds on the minimum effective density that demarcates regions of success or failure of this technique. Our conditions are in terms of (a) the size of the clusters, (b) the denseness of the graph, and (c) regularization parameter of the convex program. We also present extensive simulations that corroborate our theoretical findings.

[1]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[2]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[3]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[4]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[6]  Raj Rao Nadakuditi,et al.  On hard limits of eigen-analysis based planted clique detection , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[7]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[8]  Venkatesan Guruswami,et al.  Correlation clustering with a fixed number of clusters , 2005, SODA '06.

[9]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[10]  Béla Bollobás,et al.  Max Cut for Random Graphs with a Planted Partition , 2004, Combinatorics, Probability and Computing.

[11]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[12]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[13]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[14]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[15]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[16]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[17]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[18]  Stephen A. Vavasis,et al.  Convex optimization for the planted k-disjoint-clique problem , 2014, Math. Program..

[19]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[20]  Robert E. Tarjan,et al.  Clustering Social Networks , 2007, WAW.

[21]  Stefano Lonardi,et al.  A parallel algorithm for clustering protein-protein interaction networks , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[22]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[23]  Nir Ailon,et al.  Breaking the Small Cluster Barrier of Graph Clustering , 2013, ICML.

[24]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[25]  Dieter Mitsche,et al.  Reconstructing Many Partitions Using Spectral Techniques , 2005, FCT.

[26]  Stephen A. Vavasis,et al.  Nuclear norm minimization for the planted clique and biclique problems , 2009, Math. Program..

[27]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[28]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[29]  Van H. Vu,et al.  Spectral norm of random matrices , 2005, STOC '05.

[30]  Hans-Peter Kriegel,et al.  A Database Interface for Clustering in Large Spatial Databases , 1995, KDD.

[31]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[32]  Babak Hassibi,et al.  Finding Dense Clusters via "Low Rank + Sparse" Decomposition , 2011, ArXiv.

[33]  Amos Fiat,et al.  Correlation Clustering - Minimizing Disagreements on Arbitrary Weighted Graphs , 2003, ESA.

[34]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[35]  Nikhil Bansal,et al.  Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[36]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2015, Lecture Notes in Computer Science.

[37]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[38]  N. Alon,et al.  On the concentration of eigenvalues of random symmetric matrices , 2000, math-ph/0009032.