Incorporating Entities in News Topic Modeling

News articles express information by concentrating on named entities like who, when, and where in news. Whereas, extracting the relationships among entities, words and topics through a large amount of news articles is nontrivial. Topic modeling like Latent Dirichlet Allocation has been applied a lot to mine hidden topics in text analysis, which have achieved considerable performance. However, it cannot explicitly show relationship between words and entities. In this paper, we propose a generative model, Entity-Centered Topic Model(ECTM) to summarize the correlation among entities, words and topics by taking entity topic as a mixture of word topics. Experiments on real news data sets show our model of a lower perplexity and better in clustering of entities than state-of-the-art entity topic model(CorrLDA2). We also present analysis for results of ECTM and further compare it with CorrLDA2.

[1]  Yizhou Sun,et al.  ETM: Entity Topic Models for Mining Documents Associated with Entities , 2012, 2012 IEEE 12th International Conference on Data Mining.

[2]  Klaus-Dieter Althoff,et al.  Professional Knowledge Management, Third Biennial Conference, WM 2005, Kaiserslautern, Germany, April 10-13, 2005, Revised Selected Papers , 2005, Wissensmanagement.

[3]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[4]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[5]  Enrico Motta,et al.  ESpotter: Adaptive Named Entity Recognition for Web Browsing , 2005, Wissensmanagement.

[6]  Xiaobing Xue,et al.  Topic modeling for named entity queries , 2011, CIKM '11.

[7]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[10]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[11]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[12]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[14]  Prithviraj Sen Collective context-aware topic models for entity disambiguation , 2012, WWW.

[15]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[16]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[17]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.