Machine transliteration using multiple transliteration engines and hypothesis re-ranking

This paper describes a novel method of improving machine transliteration by using multiple transliteration hypotheses and re-ranking them. We constructed seven machine-transliteration engines to produce a set of transliteration hypotheses. We then re-ranked the hypotheses to select the correct transliteration hypothesis. We propose a re-ranking method that makes use of confidence-score, languagemodel, and Web-frequency features and combines them with machine-learning algorithms including support vector machines and the maximum entropy model. Our testing of English-to-Japanese and English-to-Korean transliterations revealed that the individual transliteration engines used in our approach performed comparably to previous approaches and that re-ranking improved word accuracy compared to the best individual engine from about 65 to 88%.

[1]  Key-Sun Choi,et al.  An Ensemble of Grapheme and Phoneme for Machine Transliteration , 2005, IJCNLP.

[2]  강병주,et al.  한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 = A resolution of word mismatch problem caused by foreign word transliterations and english words in Korean information retrieval , 2001 .

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jonathan G. Fiscus,et al.  REDUCED WORD ERROR RATES , 1997 .

[6]  Tetsuya Ishikawa,et al.  Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration , 2001, Comput. Humanit..

[7]  Tadashi Nomoto Multi-Engine Machine Translation with Voted Language Model , 2004, ACL.

[8]  Daniel Marcu,et al.  NP Bracketing by Maximum Entropy Tagging and SVM Reranking , 2004, EMNLP.

[9]  Anoop Sarkar,et al.  Discriminative Reranking for Machine Translation , 2004, NAACL.

[10]  Hozumi Tanaka,et al.  Improving Back-Transliteration by Combining Information Sources , 2004, IJCNLP.

[11]  H. Isahara,et al.  A Comparison of Different Machine Transliteration Models , 2006, J. Artif. Intell. Res..

[12]  Aravind K. Joshi,et al.  An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[13]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[14]  Naoto Kato,et al.  Transliteration Considering Context Information based on the Maximum Entropy Method , 2003 .

[15]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[16]  In-Ho Kang,et al.  English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks , 2000, COLING.

[17]  Heng Ji,et al.  Re-Ranking Algorithms for Name Tagging , 2006 .

[18]  Hitoshi Isahara,et al.  Improving Machine Transliteration Performance by Using Multiple Transliteration Models , 2006, ICCPOL.

[19]  Daisuke Kawahara,et al.  Case Frame Compilation from the Web using High-Performance Computing , 2006, LREC.

[20]  Kevin Knight,et al.  Machine Transliteration , 1997, CL.

[21]  Hermann Ney,et al.  Cross-Site and Intra-Site ASR System Combination: Comparisons on Lattice and 1-Best Methods , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[23]  Key-Sun Choi,et al.  An English-Korean Transliteration Model Using Pronunciation and Contextual Rules , 2002, COLING.

[24]  Gregory Grefenstette,et al.  Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation , 2004, ACL.

[25]  Ying Zhang,et al.  Mining translations of OOV terms from the web through cross-lingual query expansion , 2005, SIGIR '05.

[26]  Yaser Al-Onaizan,et al.  Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.