A machine transliteration model based on correspondence between graphemes and phonemes

Machine transliteration is an automatic method for converting words in one language into phonetically equivalent ones in another language. There has been growing interest in the use of machine transliteration to assist machine translation and information retrieval. Three types of machine transliteration models---grapheme-based, phoneme-based, and hybrid---have been proposed. Surprisingly, there have been few reports of efforts to utilize the correspondence between source graphemes and source phonemes, although this correspondence plays an important role in machine transliteration. Furthermore, little work has been reported on ways to dynamically handle source graphemes and phonemes. In this paper, we propose a transliteration model that dynamically uses both graphemes and phonemes, particularly the correspondence between them. With this model, we have achieved better performance---improvements of about 15 to 41% in English-to-Korean transliteration and about 16 to 44% in English-to-Japanese transliteration---than has been reported for other models.

[1]  Noriko Kando,et al.  Overview of IR tasks , 1999, NTCIR.

[2]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.

[3]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[4]  Hsin-Hsi Chen,et al.  Backward Machine Transliteration by Learning Phonetic Similarity , 2002, CoNLL.

[5]  Tetsuya Ishikawa,et al.  Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration , 2001, Comput. Humanit..

[6]  Tsujii Jun'ichi,et al.  Maximum entropy estimation for feature forests , 2002 .

[7]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[8]  Naoto Kato,et al.  Transliteration Considering Context Information based on the Maximum Entropy Method , 2003 .

[9]  Key-Sun Choi,et al.  Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval , 2000, IRAL '00.

[10]  Zhang Le,et al.  Maximum Entropy Modeling Toolkit for Python and C , 2004 .

[11]  Jin-Shea Kuo,et al.  Generating Paired Transliterated-cognates Using Multiple Pronunciation Characteristics from Web corpora , 2004, PACLIC.

[12]  강병주,et al.  한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 = A resolution of word mismatch problem caused by foreign word transliterations and english words in Korean information retrieval , 2001 .

[13]  Eric Brill,et al.  Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs , 2001, NLPRS.

[14]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[15]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[16]  Jian Su,et al.  A Joint Source-Channel Model for Machine Transliteration , 2004, ACL.

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[21]  Hozumi Tanaka,et al.  Improving Back-Transliteration by Combining Information Sources , 2004, IJCNLP.

[22]  In-Ho Kang,et al.  English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks , 2000, COLING.

[23]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[24]  Key-Sun Choi,et al.  Automatic Transliteration and Back-transliteration by Decision Tree Learning , 2000, LREC.

[25]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[26]  Keita Tsuji Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs , 2002, Int. J. Comput. Process. Orient. Lang..

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[29]  Jae Sung Lee,et al.  English to Korean Statistical Transliteration for Information Retrieval , 2008 .

[30]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.