Mining New Translation Lexicons from Wikipedia

In this paper, we propose a method of mining new translation lexicons from Wikipedia. To do this, first a network is constructed with nodes representing Wikipedia articles and links representing the dependencies between the articles. Then we learn a model of translation lexicons by investigating co-occurrence patterns of the existing translation lexicons in a Wikipedia network. Finally, new translation lexicons can be found by estimating the cross-lingual similarities between the nodes in different languages. Experiments show that our method effectively finds new translation lexicons.

[1]  Dekai Wu,et al.  Large-scale automatic extraction of an English-Chinese translation lexicon , 2004, Machine Translation.

[2]  Hitoshi Isahara,et al.  Reliable Measures for Aligning Japanese-English News Articles and Sentences , 2003, ACL.

[3]  Pascale Fung,et al.  A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[4]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[5]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[6]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[8]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[9]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[10]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[11]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[12]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[13]  Hwee Tou Ng,et al.  Mining New Word Translations from Comparable Corpora , 2004, COLING.