Extracting English-Korean Transliteration Pairs from Web Corpora

Transliteration pair acquisition has received significant attention as a technique for constructing up-to-date transliteration lexicons, and for supporting machine translation and cross-language information retrieval. Previous studies on transliteration pair acquisition focused on only the phonetic similarity model but seldom considered the real-usage of transliterations in texts. Moreover, previous web-based validation models considered only one-way validation (validation from the viewpoint of a source term) rather than joint validation between a source term and a target term. To address these problems, we propose a novel transliteration pair acquisition model that extracts transliteration pairs from the Web and validates the pairs by combining the phonetic similarity and joint web-validation models. Experiments demonstrated that our transliteration pair acquisition model was effective.

[1]  Jason S. Chang,et al.  Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model , 2003, ParallelTexts@NAACL-HLT.

[2]  Tetsuya Ishikawa,et al.  Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration , 2001, Comput. Humanit..

[3]  Gregory Grefenstette,et al.  Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation , 2004, ACL.

[4]  Jenq-Haur Wang,et al.  Exploiting the Web as the multilingual corpus for unknown query translation , 2006 .

[5]  Keita Tsuji Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs , 2002, Int. J. Comput. Process. Orient. Lang..

[6]  Hitoshi Isahara,et al.  A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora , 2006, TSD.

[7]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[8]  Eric Brill,et al.  Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs , 2001, NLPRS.

[9]  Hsi-Jian Lee,et al.  Anchor text mining for translation of Web queries: A transitive translation approach , 2004, TOIS.

[10]  Hsi-Jian Lee,et al.  Translation of web queries using anchor text mining , 2002, TALIP.

[11]  Key-Sun Choi,et al.  Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval , 2000, IRAL '00.