Identification of Generalized Semantic Communities in Large Social Networks

Community detection in networks is a fundamental data analysis task. Recently, researchers have tried to improve its performance by exploiting semantic contents and interpret the communities. However, they typically assume that communities are assortative (i.e. vertices are mostly connected to others within the group), thus they cannot find the generalized community structures, which includes assortative communities, disassortative communities (i.e. most connections are from two groups), or a combination. In addition, they often assume that each group membership corresponds to a single topic, thus they cannot perform well when the contents are not consistent with community structures. To address these two issues, we propose a new Bayesian model and develop an efficient variational inference algorithm for model inference. This model describes the generalized communities and the topical clusters separately, and explores their latent correlation simultaneously to make the two parts mutually reinforcing. Our model is not only robust to the above problems, but also can interpret each community using more than one topic. We validate the robustness of this approach on an artificial benchmark, and analyze its interpretability by a case study. We finally show its superior community detection performance by comparing with eight state-of-the-art algorithms on eight real networks.

[1]  Petter Holme,et al.  Network bipartivity. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Mark E. J. Newman,et al.  Community detection in networks: Modularity optimization and maximum likelihood are equivalent , 2016, ArXiv.

[3]  Purnamrita Sarkar,et al.  Hypothesis testing for automated community detection in networks , 2013, ArXiv.

[4]  Hong Cheng,et al.  VizCS: Online Searching and Visualizing Communities in Dynamic Graphs , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[5]  Yu Xie,et al.  Community discovery in networks with deep sparse filtering , 2018, Pattern Recognit..

[6]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[7]  Weixiong Zhang,et al.  Joint Identification of Network Communities and Semantics via Integrative Modeling of Network Topologies and Node Contents , 2017, AAAI.

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Xuelong Li,et al.  Constrained Nonnegative Matrix Factorization for Image Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Aditya Johri,et al.  Finding Community Topics and Membership in Graphs , 2015, ECML/PKDD.

[11]  Kun He,et al.  Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach , 2015, WWW.

[12]  Qi Li,et al.  A Hybrid Spectral Method for Network Community Detection , 2018, APWeb/WAIM.

[13]  E A Leicht,et al.  Mixture models and exploratory analysis in networks , 2006, Proceedings of the National Academy of Sciences.

[14]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[15]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[16]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Jianjiang Feng,et al.  Smooth Representation Clustering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Srinivasan Parthasarathy,et al.  Efficient community detection in large networks using content and links , 2012, WWW.

[19]  Xiaochun Cao,et al.  Modularity Based Community Detection with Deep Learning , 2016, IJCAI.

[20]  Cristopher Moore,et al.  Scalable text and link analysis with mixed-topic link models , 2013, KDD.

[21]  Leto Peel Supervised Blockmodelling , 2012, ArXiv.

[22]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[23]  Santo Fortunato,et al.  Network structure, metadata and the prediction of missing nodes , 2016, ArXiv.

[24]  Lin Gao,et al.  Defining and identifying cograph communities in complex networks , 2015 .

[25]  Ge Zhang,et al.  Detecting Communities with Multiplex Semantics by Distinguishing Background, General, and Specialized Topics , 2020, IEEE Transactions on Knowledge and Data Engineering.

[26]  Xiaobao Wang,et al.  Robust Detection of Link Communities With Summary Description in Social Networks , 2021, IEEE Transactions on Knowledge and Data Engineering.

[27]  Leto Peel,et al.  Topological feature based classification , 2011, 14th International Conference on Information Fusion.

[28]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[29]  Mark E. J. Newman,et al.  Generalized communities in networks , 2015, Physical review letters.

[30]  Cristopher Moore,et al.  Scalable detection of statistically significant communities and hierarchies, using message passing for modularity , 2014, Proceedings of the National Academy of Sciences.

[31]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[32]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[33]  Xiaobao Wang,et al.  Identification of Generalized Communities with Semantics in Networks with Content , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[34]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[35]  Chao Li,et al.  HetNERec: Heterogeneous network embedding based recommendation , 2020, Knowl. Based Syst..

[36]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[37]  Xiao Wang,et al.  Hierarchical Taxonomy Aware Network Embedding , 2018, KDD.

[38]  Francesco Bonchi,et al.  Description-Driven Community Detection , 2014, TIST.

[39]  Micah Adler,et al.  Clustering Relational Data Using Attribute and Link Information , 2003 .

[40]  Zi Huang,et al.  From Community Detection to Community Profiling , 2017, Proc. VLDB Endow..

[41]  Xiaochun Cao,et al.  Semantic Community Identification in Large Attribute Networks , 2016, AAAI.