Statistical pattern recognition approaches for retrieval-based machine translation systems

This dissertation addresses the problem of Machine Translation (MT), which is defined as an automated translation of a document written in one language (the source language) to another (the target language) by a computer. The MT task requires various types of knowledge of both the source and target language, e.g., linguistic rules and linguistic exceptions. Traditional MT systems rely on an extensive parsing strategy to decode the linguistic rules and use a knowledge base to encode those linguistic exceptions. However, the construction of the knowledge base becomes an issue as the translation system grows. To overcome this difficulty, real translation examples are used instead of a manually-crafted knowledge base. This design strategy is known as the Example-Based Machine Translation (EBMT) principle. Traditional EBMT systems utilize a database of word or phrase translation pairs. The main challenge of this approach is the difficulty of combining the word or phrase translation units into a meaningful and fluent target text. A novel Retrieval-Based Machine Translation (RBMT) system, which uses a sentence-level translation unit, is proposed in this study. An advantage of using the sentence-level translation unit is that the boundary of a sentence is explicitly defined and the semantic, or meaning, is precise in both the source and target language. The main challenge of using a sentential translation unit is the limited coverage, i.e., the difficulty of finding an exact match between a user query and sentences in the source database. Using an electronic dictionary and a topic modeling procedure, we develop a procedure to obtain clusters of sensible variations for each example in the source database. The coverage of our MT system improves because an input query text is matched against a cluster of sensible variations of translation examples instead of being matched against an original source example. In addition, pattern recognition techniques are used to improve the matching procedure, i.e., the design of optimal pattern classifiers and the incorporation of subjective judgments. A high performance statistical pattern classifier is used to identify the target sentences from an input query sentence in our MT system. The proposed classifier is different from the conventional classifier in terms of the way it addresses the generalization capability. A conventional classifier addresses the generalization issue using the parsimony principle and may encounter the possibility of choosing an oversimplified statistical model. The proposed classifier directly addresses the generalization issue in terms of training (empirical) data. Our classifier is expected to generalize better than the conventional classifiers because our classifier is less likely to use oversimplified statistical models based on the available training data. We further improve the matching procedure by the incorporation of subjective judgments. We formulate a novel cost function that combines subjective judgments and the degree of matching between translation examples and an input query. In addition, we provide an optimization strategy for the novel cost function so that the statistical model can be optimized according to the subjective judgments.

[1]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[2]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[3]  J. Blum Multidimensional Stochastic Approximation Methods , 1954 .

[4]  Dwi Sianto Mansjur,et al.  Incremental learning of mixture models for simultaneous estimation of class distribution and inter-class decision boundaries , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Taro Watanabe,et al.  A corpus-centered approach to spoken language translation , 2003, EACL.

[6]  Czech Technical,et al.  Optimization Algorithms for Kernel Methods , 2005 .

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[10]  Fred Popowich,et al.  What is example-based machine translation? , 2001, MTSUMMIT.

[11]  Philip J. Hayes,et al.  TCS: a shell for content-based text categorization , 1990, Sixth Conference on Artificial Intelligence for Applications.

[12]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[13]  Harold L. Somers,et al.  An introduction to machine translation , 1992 .

[14]  Eiichiro Sumita,et al.  Translating with Examples: A New Approach to Machine Translation , 2005 .

[15]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[18]  Hitoshi Iida,et al.  Experiments and Prospects of Example-Based Machine Translation , 1991, ACL.

[19]  Guy W. Mineau,et al.  Beyond TFIDF Weighting for Text Categorization in the Vector Space Model , 2005, IJCAI.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[23]  Brona Collins,et al.  Example-Based Machine Translation: An Adaptation-Guided Retrieval Approach , 1999 .

[24]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[25]  Ralf D. Brown,et al.  Automated Generalization of Translation Examples , 2000, COLING.

[26]  Chris Callison-Burch,et al.  Paraphrasing and translation , 2007 .

[27]  Dekai Wu,et al.  MT model space: statistical versus compositional versus example-based machine translation , 2005, Machine Translation.

[28]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[29]  Michael Carl,et al.  Towards a Dynamic Linkage of Example-based and Rule-based Machine Translation , 2004, Machine Translation.

[30]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[31]  H. Robbins A Stochastic Approximation Method , 1951 .

[32]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[33]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[34]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[35]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[36]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[37]  Taro Watanabe,et al.  EBMT, SMT, hybrid and more: ATR spoken language translation system , 2004, IWSLT.

[38]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[39]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[40]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[41]  Dwi Sianto Mansjur,et al.  Utilizing non-uniform cost learning for active control of inter-class confusion , 2008, 2008 19th International Conference on Pattern Recognition.

[42]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[43]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[44]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[45]  Michael Carl A Model of Competence for Corpus-Based Machine Translation , 2000, COLING.

[46]  Dale Schuurmans,et al.  Language and Task Independent Text Categorization with Simple Language Models , 2003, NAACL.

[47]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[48]  ANDY WAY,et al.  Comparing example-based and statistical machine translation , 2005, Nat. Lang. Eng..

[49]  Hitoshi Iida,et al.  Cooperation between Transfer and Analysis in Example-Based Framework , 1992, COLING.

[50]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[51]  Corinna Cortes,et al.  Prediction of Generalization Ability in Learning Machines , 1994 .

[52]  Hermann Ney,et al.  Improved Alignment Models for Statistical Machine Translation , 1999, EMNLP.

[53]  Alain Biem,et al.  A Bayesian model selection criterion for HMM topology optimization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[54]  Jaime G. Carbonell,et al.  Spectral Clustering for Example Based Machine Translation , 2006, HLT-NAACL.

[55]  J. Doob Stochastic processes , 1953 .

[56]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[57]  Andrew R. Barron,et al.  Mixture Density Estimation , 1999, NIPS.

[58]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[59]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[60]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[61]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[62]  Satoshi Sato,et al.  CTM: An Example-Based Translation Aid System , 1992, COLING.

[63]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[64]  Dwi Sianto Mansjur,et al.  Empirical System Learning for Statistical Pattern Recognition With Non-Uniform Error Criteria , 2010, IEEE Transactions on Signal Processing.

[65]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[66]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[67]  Sergei Nirenburg,et al.  Two Approaches to Matching in Example-Based Machine Translation , 1993, TMI.

[68]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[69]  Dwi Sianto Mansjur,et al.  Non-Uniform error criteria for automatic pattern and speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[70]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[71]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[72]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[73]  渡邉 太郎,et al.  Example-based statistical machine translation , 2004 .

[74]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[75]  Nikos A. Vlassis,et al.  A Greedy EM Algorithm for Gaussian Mixture Learning , 2002, Neural Processing Letters.

[76]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[77]  Hitoshi Iida,et al.  Integration of example-based transfer and rule-based generation , 1994, ANLP.

[78]  Hiroyuki Kaji,et al.  Learning Translation Templates From Bilingual Text , 1992, COLING.

[79]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[80]  Jen-Tzung Chien,et al.  Minimum Rank Error Language Modeling , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[81]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[82]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.