A Novel Generative Topic Embedding Model by Introducing Network Communities

Topic models have many important applications in fields such as Natural Language Processing. Topic embedding modelling aims at introducing word and topic embeddings into topic models to describe correlations between topics. Existing topic embedding methods use documents alone, which suffer from the topical fuzziness problem brought by the introduction of embeddings of semantic fuzzy words, e.g. polysemous words or some misleading academic terms. Links often exist between documents which form document networks. The use of links may alleviate this semantic fuzziness, but they are sparse and noisy which may meanwhile mislead topics. In this paper, we utilize community structure to solve these problems. It can not only alleviate the topical fuzziness of topic embeddings since communities are often believed to be topic related, but also can overcome the drawbacks brought by the sparsity and noise of networks (because community is a high-order network information). We give a new generative topic embedding model which incorporates documents (with topics) and network (with communities) together, and uses probability transition to describe the relationship between topics and communities to make it robust when topics and communities do not match. An efficient variational inference algorithm is then proposed to learn the model. We validate the superiority of our new approach on two tasks, document classifications and visualization of topic embeddings, respectively.

[1]  Inderjit S. Dhillon,et al.  Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion , 2015, IEEE Transactions on Knowledge and Data Engineering.

[2]  M. Narasimha Murty,et al.  Structural Neighborhood Based Classification of Nodes in a Network , 2016, KDD.

[3]  Jennifer Neville,et al.  Using Transactional Information to Predict Link Strength in Online Social Networks , 2009, ICWSM.

[4]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Bo Zhang,et al.  Scalable Inference for Logistic-Normal Topic Models , 2013, NIPS.

[6]  Jianwu Dang,et al.  Robust Detection of Link Communities in Large Social Networks by Exploiting Link Semantics , 2018, AAAI.

[7]  Changjun Jiang,et al.  Discovering Canonical Correlations between Topical and Topological Information in Document Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Chunyan Miao,et al.  A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution , 2015, EMNLP.

[9]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[10]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[11]  Dat Quoc Nguyen,et al.  Improving Topic Models with Latent Feature Word Representations , 2015, TACL.

[12]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[13]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Weixiong Zhang,et al.  Joint Identification of Network Communities and Semantics via Integrative Modeling of Network Topologies and Node Contents , 2017, AAAI.

[15]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[16]  Kristian Kersting,et al.  Topic Models Conditioned on Relations , 2010, ECML/PKDD.

[17]  Alfred O. Hero,et al.  Deep Community Detection , 2014, IEEE Transactions on Signal Processing.

[18]  Chunyan Miao,et al.  Generative Topic Embedding: a Continuous Representation of Documents , 2016, ACL.

[19]  Laks V. S. Lakshmanan,et al.  Attribute-Driven Community Search , 2016, Proc. VLDB Endow..

[20]  Ying Huang,et al.  Efficient Correlated Topic Modeling with Topic Embedding , 2017, KDD.

[21]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[22]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[23]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[24]  Yu Zhou,et al.  Nonnegative matrix factorization with mixed hypergraph regularization for community detection , 2018, Inf. Sci..

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Charu C. Aggarwal,et al.  Linked Document Embedding for Classification , 2016, CIKM.

[27]  Ge Zhang,et al.  Finding Communities with Hierarchical Semantics by Distinguishing General and Specialized topics , 2018, IJCAI.

[28]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[29]  Stephen E. Fienberg,et al.  Discriminative topic modeling based on manifold learning , 2010, TKDD.

[30]  Di Jiang,et al.  Latent Topic Embedding , 2016, COLING.

[31]  Heng Ji,et al.  Exploring Context and Content Links in Social Media: A Latent Space Method , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.