Overlapping Communities Detection via Measure Space Embedding

We present a new algorithm for community detection. The algorithm uses random walks to embed the graph in a space of measures, after which a modification of $k$-means in that space is applied. The algorithm is therefore fast and easily parallelizable. We evaluate the algorithm on standard random graph benchmarks, including some overlapping community benchmarks, and find its performance to be better or at least as good as previously known algorithms. We also prove a linear time (in number of edges) guarantee for the algorithm on a $p,q$-stochastic block model with $p \geq c\cdot N^{-\frac{1}{2} + \epsilon}$ and $p-q \geq c' \sqrt{p N^{-\frac{1}{2} + \epsilon} \log N}$.

[1]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[2]  Santo Fortunato,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[3]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[5]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[6]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[8]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[9]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[12]  Martin Rosvall,et al.  Compression of flow can reveal overlapping modular organization in networks , 2011, ArXiv.

[13]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[14]  Ron Shamir,et al.  Improved algorithms for the random cluster graph model , 2002, Random Struct. Algorithms.

[15]  Philip M. Long,et al.  Finding Planted Partitions in Nearly Linear Time using Arrested Spectral Clustering , 2010, ICML.

[16]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[18]  Svante Janson,et al.  Random graphs , 2000, ZOR Methods Model. Oper. Res..

[19]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[20]  W. L. Nicholson,et al.  On the Normal Approximation to the Hypergeometric Distribution , 1956 .

[21]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[22]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[23]  Mark Jerrum,et al.  The Metropolis Algorithm for Graph Bisection , 1998, Discret. Appl. Math..

[24]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[25]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[26]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[29]  Andrea Lancichinetti,et al.  Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Ravi B. Boppana,et al.  Eigenvalues and graph bisection: An average-case analysis , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[31]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[32]  P. Ronhovde,et al.  Multiresolution community detection for megascale networks by information-based replica correlations. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[34]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[35]  John G. van Bosse,et al.  Wiley Series in Telecommunications and Signal Processing , 2006 .

[36]  Richard M. Karp,et al.  Algorithms for graph partitioning on the planted partition model , 2001, Random Struct. Algorithms.

[37]  Jean-Charles Delvenne,et al.  Stability of graph communities across time scales , 2008, Proceedings of the National Academy of Sciences.

[38]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[39]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[40]  Anima Anandkumar,et al.  A Tensor Spectral Approach to Learning Mixed Membership Community Models , 2013, COLT.