Generating information-rich taxonomy from Wikipedia

Even though hyponymy relation acquisition has been extensively studied, “how informative such acquired hyponymy relations are” has not been sufficiently discussed. We found that the hypernyms in automatically acquired hyponymy relations were often too vague or ambiguous to specify the meaning of their hyponyms. For instance, hypernym work is vague and ambiguous in hyponymy relations work/Avatar and work/The Catcher in the Rye. In this paper, we propose a simple method of generating intermediate concepts of hyponymy relations that can make such (vague) hypernyms more specific. Our method generates such an information-rich hyponymy relation as work / work by film director / work by James Cameron / Avatar from the less informative relation work/Avatar. Furthermore, the generated relation work by film director/Avatar can be paraphrased into a new relation movie/Avatar. Experiments showed that our method successfully acquired 2,719,441 enriched hyponymy relations with one intermediate concept with 0.853 precision and another 6,347,472 hyponymy relations with 0.786 precision.

[1]  Marius Pasca,et al.  Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds , 2007, WWW '07.

[2]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[3]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[4]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[5]  Jong-Hoon Oh,et al.  Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition , 2009, ACL/IJCNLP.

[6]  Kentaro Torisawa,et al.  Boosting Precision and Recall of Hyponymy Relation Acquisition from Hierarchical Layouts in Wikipedia , 2008, LREC.

[7]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[8]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[9]  Kentaro Torisawa,et al.  Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations , 2008, ACL.

[10]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[11]  Patrick Pantel,et al.  Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[12]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[13]  Michael Strube,et al.  Decoding Wikipedia Categories for Knowledge Acquisition , 2008, AAAI.

[14]  Kentaro Torisawa An Unsupervised Method for Canonicalization of Japanese Postpositions , 2001, NLPRS.

[15]  Marius Pasca,et al.  Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies , 2009, EACL.

[16]  Ellen Riloff,et al.  Toward Completeness in Concept Extraction and Classification , 2009, EMNLP.