论文信息 - Machine transliteration using multiple transliteration engines and hypothesis re-ranking - 字舞流文

Machine transliteration using multiple transliteration engines and hypothesis re-ranking

This paper describes a novel method of improving machine transliteration by using multiple transliteration hypotheses and re-ranking them. We constructed seven machine-transliteration engines to produce a set of transliteration hypotheses. We then re-ranked the hypotheses to select the correct transliteration hypothesis. We propose a re-ranking method that makes use of confidence-score, languagemodel, and Web-frequency features and combines them with machine-learning algorithms including support vector machines and the maximum entropy model. Our testing of English-to-Japanese and English-to-Korean transliterations revealed that the individual transliteration engines used in our approach performed comparably to previous approaches and that re-ranking improved word accuracy compared to the best individual engine from about 65 to 88%.

Hitoshi Isahara | Jong-Hoon Oh | H. Isahara | Jong-Hoon Oh

[1] Key-Sun Choi,et al. An Ensemble of Grapheme and Phoneme for Machine Transliteration , 2005, IJCNLP.

[2] 강병주,et al. 한국어 정보검색에서 외래어와 영어로 인한 단어불일치문제의 해결 = A resolution of word mismatch problem caused by foreign word transliterations and english words in Korean information retrieval , 2001 .

[3] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4] R. Schwartz,et al. The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5] Jonathan G. Fiscus,et al. REDUCED WORD ERROR RATES , 1997 .

[6] Tetsuya Ishikawa,et al. Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration , 2001, Comput. Humanit..

[7] Tadashi Nomoto. Multi-Engine Machine Translation with Voted Language Model , 2004, ACL.

[8] Daniel Marcu,et al. NP Bracketing by Maximum Entropy Tagging and SVM Reranking , 2004, EMNLP.

[9] Anoop Sarkar,et al. Discriminative Reranking for Machine Translation , 2004, NAACL.

[10] Hozumi Tanaka,et al. Improving Back-Transliteration by Combining Information Sources , 2004, IJCNLP.

[11] H. Isahara,et al. A Comparison of Different Machine Transliteration Models , 2006, J. Artif. Intell. Res..

[12] Aravind K. Joshi,et al. An SVM-based voting algorithm with application to parse reranking , 2003, CoNLL.

[13] Thorsten Joachims,et al. Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[14] Naoto Kato,et al. Transliteration Considering Context Information based on the Maximum Entropy Method , 2003 .

[15] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[16] In-Ho Kang,et al. English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks , 2000, COLING.

[17] Heng Ji,et al. Re-Ranking Algorithms for Name Tagging , 2006 .

[18] Hitoshi Isahara,et al. Improving Machine Transliteration Performance by Using Multiple Transliteration Models , 2006, ICCPOL.

[19] Daisuke Kawahara,et al. Case Frame Compilation from the Web using High-Performance Computing , 2006, LREC.

[20] Kevin Knight,et al. Machine Transliteration , 1997, CL.

[21] Hermann Ney,et al. Cross-Site and Intra-Site ASR System Combination: Comparisons on Lattice and 1-Best Methods , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.

[23] Key-Sun Choi,et al. An English-Korean Transliteration Model Using Pronunciation and Contextual Rules , 2002, COLING.

[24] Gregory Grefenstette,et al. Finding Ideographic Representations of Japanese Names Written in Latin Script via Language Identification and Corpus Validation , 2004, ACL.

[25] Ying Zhang,et al. Mining translations of OOV terms from the web through cross-lingual query expansion , 2005, SIGIR '05.

[26] Yaser Al-Onaizan,et al. Translating Named Entities Using Monolingual and Bilingual Resources , 2002, ACL.