Community discovery using nonnegative matrix factorization

Complex networks exist in a wide range of real world systems, such as social networks, technological networks, and biological networks. During the last decades, many researchers have concentrated on exploring some common things contained in those large networks include the small-world property, power-law degree distributions, and network connectivity. In this paper, we will investigate another important issue, community discovery, in network analysis. We choose Nonnegative Matrix Factorization (NMF) as our tool to find the communities because of its powerful interpretability and close relationship between clustering methods. Targeting different types of networks (undirected, directed and compound), we propose three NMF techniques (Symmetric NMF, Asymmetric NMF and Joint NMF). The correctness and convergence properties of those algorithms are also studied. Finally the experiments on real world networks are presented to show the effectiveness of the proposed methods.

[1]  Volker Tresp,et al.  Learning to learn and collaborative filtering , 2005, NIPS 2005.

[2]  Gang Chen,et al.  Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[3]  Chia-Hui Chang,et al.  Aspect Summarization from Blogsphere for Social Study , 2007 .

[4]  Atsuyoshi Nakamura,et al.  Partitioning of Web graphs by community topology , 2005, WWW '05.

[5]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[6]  S. Wasserman,et al.  Social Network Analysis: Computer Programs , 1994 .

[7]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[8]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[10]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[11]  L. Amaral,et al.  Small-World Networks: Evidence for a Crossover Picture , 1999, cond-mat/9903108.

[12]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  R. Carter 11 – IT and society , 1991 .

[14]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[15]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[16]  GhoshJoydeep,et al.  Cluster ensembles --- a knowledge reuse framework for combining multiple partitions , 2003 .

[17]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[19]  John Scott What is social network analysis , 2010 .

[20]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[22]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[23]  Yangqiu Song,et al.  Parallel Spectral Clustering Algorithm for Large-Scale Community Data Mining , 2008 .

[24]  Fei Wang,et al.  Recommendation on Item Graphs , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[27]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[28]  John Yen,et al.  Probabilistic Community Discovery Using Hierarchical Latent Gaussian Mixture Model , 2007, AAAI.

[29]  Eric Horvitz,et al.  Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach , 2000, UAI.

[30]  Eric Horvitz,et al.  Collaborative filtering by personality diagnosis , 2000, UAI 2000.

[31]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[32]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[33]  Gang Chen,et al.  Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization , 2007 .

[34]  John Scott Social Network Analysis , 1988 .

[35]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[36]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[37]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[38]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[39]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[40]  Philip S. Yu,et al.  Relational clustering by symmetric convex coding , 2007, ICML '07.

[41]  Tao Li,et al.  On the Equivalence Between Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing , .

[42]  Weixiong Zhang,et al.  An Efficient Spectral Algorithm for Network Community Discovery and Its Applications to Biological and Social Networks , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[43]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[45]  P. Paatero,et al.  Positive matrix factorization applied to a curve resolution problem , 1998 .

[46]  David J. Marchette,et al.  Scan Statistics on Enron Graphs , 2005, Comput. Math. Organ. Theory.

[47]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[48]  P. Sonneveld,et al.  Nonnegative matrix factorization of a correlation matrix , 2009 .

[49]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[50]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.