Generating Information-Rich Taxonomy Using Wikipedia

Hyponymy relation acquisition has been extensively studied. However, the informa-tiveness of acquired hypernyms has not been sufficiently discussed. We found that the hypernyms in automatically acquired hyponymy relations are often too vague for their hyponyms. For instance, “work” is a vague hypernym for “work ! Seven Samurai” and “work ! 1Q84”. These vague hypernyms sometimes cause the lower accuracy for NLP applications such as information retrieval or question answering. In this paper, we propose a method of making (vague) hypernyms more specific ex-y ploting Wikipedia. For instance, our method generates two intermediate nodes “work by Akira Kurosawa” and “work by film director” for a original hyponymy relation “work ! Seven Samurai”. We show that our method acquires 2,719,441 hyponymy relations with the first intermediate concepts (such as “work by Akira Kurosawa”) with 85.3% weighted precision and 6,347,472 hyponymy relations with the second intermediate concepts (such as “work by film director”) with 78.6% weighted precision. Furthermore, we confirm that hyponymy relaitons acquired by our method can be interpreted as “object (cid:0) attribute (cid:0) value”.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Jong-Hoon Oh,et al.  Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition , 2009, ACL/IJCNLP.

[3]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[4]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[5]  Kentaro Torisawa An Unsupervised Method for Canonicalization of Japanese Postpositions , 2001, NLPRS.

[6]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[7]  Ellen Riloff,et al.  Toward Completeness in Concept Extraction and Classification , 2009, EMNLP.

[8]  Marius Pasca,et al.  Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies , 2009, EACL.

[9]  Satoshi Sekine,et al.  Automatic Extraction of Hyponyms from Japanese Newspapers. Using Lexico-syntactic Patterns , 2004, LREC.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[12]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[13]  Kentaro Torisawa,et al.  Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents , 2004, COLING.

[14]  Michael Strube,et al.  Decoding Wikipedia Categories for Knowledge Acquisition , 2008, AAAI.

[15]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[16]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[17]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[18]  Masaki Murata,et al.  Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures , 2009, EMNLP.