Co-STAR: A Co-training Style Algorithm for Hyponymy Relation Acquisition from Structured and Unstructured Text

This paper proposes a co-training style algorithm called Co-STAR that acquires hyponymy relations simultaneously from structured and unstructured text. In Co-STAR, two independent processes for hyponymy relation acquisition -- one handling structured text and the other handling unstructured text -- collaborate by repeatedly exchanging the knowledge they acquired about hyponymy relations. Unlike conventional co-training, the two processes in Co-STAR are applied to different source texts and training data. We show the effectiveness of this algorithm through experiments on large-scale hyponymy-relation acquisition from Japanese Wikipedia and Web texts. We also show that Co-STAR is robust against noisy training data.

[1]  Partha Pratim Talukdar,et al.  Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[4]  Haixun Wang,et al.  Towards a Probabilistic Taxonomy of Many Concepts , 2011 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[7]  Kentaro Torisawa,et al.  Hacking Wikipedia for Hyponymy Relation Acquisition , 2008, IJCNLP.

[8]  Jong-Hoon Oh,et al.  Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition , 2009, ACL/IJCNLP.

[9]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[10]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[11]  Kentaro Torisawa,et al.  Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents , 2004, COLING.

[12]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[13]  Sujith Ravi,et al.  Using structured text for large-scale attribute extraction , 2008, CIKM '08.

[14]  Daisuke Kawahara,et al.  TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology , 2008, IJCNLP.

[15]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Patrick Pantel,et al.  Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[18]  Kentaro Torisawa,et al.  Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations , 2008, ACL.

[19]  Satoshi Sekine,et al.  Automatic Extraction of Hyponyms from Japanese Newspapers. Using Lexico-syntactic Patterns , 2004, LREC.

[20]  Masaki Murata,et al.  Large Scale Relation Acquisition Using Class Dependent Patterns , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[21]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[22]  Benjamin Van Durme,et al.  Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction , 2008, AAAI.

[23]  Zhi-Hua Zhou,et al.  Analyzing Co-training Style Algorithms , 2007, ECML.