Infobox suggestion for Wikipedia entities

Given the sheer amount of work and expertise required in authoring Wikipedia articles, automatic tools that help Wikipedia contributors in generating and improving content are valuable. This paper presents our initial step towards building a full-fledged author assistant, particularly for suggesting infobox templates for articles. We build SVM classifiers to suggest infobox template types, among a large number of possible types, to Wikipedia articles without infoboxes. Different from prior works on Wikipedia article classification which deal with only a few label classes for named entity recognition, the much larger 337-class setup in our study is geared towards realistic deployment of infobox suggestion tool. We also emphasize testing on articles without infoboxes, due to that labeled and unlabeled data exhibit different distributions of features, which departs from the typical assumption that they are drawn from the same underlying population.

[1]  Jaap Kamps,et al.  Using Links to Classify Wikipedia Pages , 2008, INEX.

[2]  Joel Nothman,et al.  Transforming Wikipedia into Named Entity Training Data , 2008, ALTA.

[3]  Kareem Darwish,et al.  Classifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment , 2010, NEWS@ACL.

[4]  Antonio Toral,et al.  A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia , 2006, Workshop On New Text Wikis And Blogs And Other Dynamic Text Sources.

[5]  Wisam Dakka,et al.  Augmenting Wikipedia with Named Entity Tags , 2008, IJCNLP.

[6]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[7]  Daniel S. Weld,et al.  Automatically refining the wikipedia infobox ontology , 2008, WWW.

[8]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[9]  Yuji Matsumoto,et al.  A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields , 2007, EMNLP.

[10]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[11]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[12]  Alexander Ulanov,et al.  Classifying Wikipedia entities into fine-grained classes , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.