Finding Communities with Hierarchical Semantics by Distinguishing General and Specialized topics

Using network topology and semantic contents to find topic-related communities is a new trend in the field of community detection. By analyzing texts in social networks, we find that topics in networked contents are often hierarchical. In most cases, they have a two-level semantic structure with general and specialized topics, to respectively denote common and specific interests of communities. However, the existing community detection methods ignore such a hierarchy and take all words used to describe node semantics from an identical perspective. This indiscriminate use of words leads to natural defects in depicting networked content in which the deep semantics is not fully utilized. To address this problem, we propose a novel probabilistic generative model. By distinguishing the general and specialized topics of words, our model not only can find community structures more accurately, but also provide two-level semantic interpretation for each community. We train the model by deriving an efficient inference method under the framework of variational expectation-maximization. We provide a case study to show the ability of our algorithm in deep semantic interpretability of communities. The superiority of our algorithm for community detection is further demonstrated in comparison with eight state-of-the-art algorithms on eight real-world networks.

[1]  Jian Pei,et al.  TIMERS: Error-Bounded SVD Restart on Dynamic Networks , 2017, AAAI.

[2]  S. Maybank,et al.  Knowledge and Information Systems REGULAR PAPER , 2006 .

[3]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[4]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[5]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Weixiong Zhang,et al.  Joint Identification of Network Communities and Semantics via Integrative Modeling of Network Topologies and Node Contents , 2017, AAAI.

[8]  Xiaochun Cao,et al.  Semantic Community Identification in Large Attribute Networks , 2016, AAAI.

[9]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[10]  Katia P. Sycara,et al.  Nonnegative Matrix Tri-Factorization with Graph Regularization for Community Detection in Social Networks , 2015, IJCAI.

[11]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[12]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[13]  Jianjiang Feng,et al.  Smooth Representation Clustering , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Tsvi Kuflik,et al.  Workshop on information heterogeneity and fusion in recommender systems (HetRec 2010) , 2010, RecSys '10.

[15]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[16]  Fei-Yue Wang,et al.  Intelligent systems and technology for integrative and predictive medicine: An ACP approach , 2013, TIST.

[17]  Jianwu Dang,et al.  Robust Detection of Link Communities in Large Social Networks by Exploiting Link Semantics , 2018, AAAI.

[18]  Xuelong Li,et al.  Constrained Nonnegative Matrix Factorization for Image Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.