Robust Dictionary Lookup in Multiple Noisy Orthographies

We present the MultiScript Phonetic Search algorithm to address the problem of language learners looking up unfamiliar words that they heard. We apply it to Arabic dictionary lookup with noisy queries done using both the Arabic and Roman scripts. Our algorithm is based on a computational phonetic distance metric that can be optionally machine learned. To benchmark our performance, we created the ArabScribe dataset, containing 10,000 noisy transcriptions of random Arabic dictionary words. Our algorithm outperforms Google Translate’s “did you mean" feature, as well as the Yamli smart Arabic keyboard.

[1]  Kristina Toutanova,et al.  Pronunciation Modeling for Improved Spelling Correction , 2002, ACL.

[2]  Godfried T. Toussaint,et al.  Measuring musical rhythm similarity: Edit distance versus minimum-weight many-to-many matchings , 2016 .

[3]  Hsin-Hsi Chen,et al.  Backward Machine Transliteration by Learning Phonetic Similarity , 2002, CoNLL.

[4]  Martin Jansche,et al.  Proper Name Transcription/Transliteration with ICU Transforms , 2010 .

[5]  Nizar Habash,et al.  Automatic Transliteration of Romanized Dialectal Arabic , 2014, CoNLL.

[6]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[7]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[8]  Kareem Darwish,et al.  Arabizi Detection and Conversion to Arabic , 2013, ANLP@EMNLP.

[9]  Khaled Shaalan,et al.  An approach for analyzing and correcting spelling errors for non-native Arabic learners , 2010, 2010 The 7th International Conference on Informatics and Systems (INFOS).

[10]  Andrew Freeman,et al.  Cross Linguistic Name Matching in English and Arabic , 2006, NAACL.

[11]  Nizar Habash,et al.  Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules , 2009, HLT-NAACL.

[12]  Waleed Ammar,et al.  Improved Transliteration Mining Using Graph Reinforcement , 2011, EMNLP.

[13]  Fuchun Peng,et al.  Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[15]  Ben Hutchinson,et al.  Using the Web for Language Independent Spellchecking and Autocorrection , 2009, EMNLP.

[16]  Yaser Al-Onaizan,et al.  Machine Transliteration of Names in Arabic Texts , 2002, SEMITIC@ACL.

[17]  Nizar Habash REMOOV : A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation , 2009 .

[18]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Kemal Oflazer,et al.  Large Scale Arabic Error Annotation: Guidelines and Framework , 2014, LREC.

[20]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[21]  Haizhou Li,et al.  A phonetic similarity model for automatic extraction of transliteration pairs , 2007, TALIP.

[22]  Grzegorz Kondrak,et al.  DirecTL: a Language Independent Approach to Transliteration , 2009, NEWS@IJCNLP.

[23]  Grzegorz Kondrak,et al.  Phonetic Alignment and Similarity , 2003, Comput. Humanit..

[24]  Nizar Habash,et al.  Processing Spontaneous Orthography , 2013, NAACL.

[25]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[26]  Mohamed Al-Badrashiny,et al.  Automatic Stochastic Arabic Spelling Correction With Emphasis on Space Insertions and Deletions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[28]  Leah S. Larkey,et al.  Statistical transliteration for english-arabic cross language information retrieval , 2003, CIKM '03.

[29]  L. Philips,et al.  Hanging on the metaphone , 1990 .

[30]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[31]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[32]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[33]  Nizar Habash,et al.  The First QALB Shared Task on Automatic Text Correction for Arabic , 2014, ANLP@EMNLP.