Evolution of Gaussian Process kernels for machine translation post-editing effort estimation

In many Natural Language Processing problems the combination of machine learning and optimization techniques is essential. One of these problems is the estimation of the human effort needed to improve a text that has been translated using a machine translation method. Recent advances in this area have shown that Gaussian Processes can be effective in post-editing effort prediction. However, Gaussian Processes require a kernel function to be defined, the choice of which highly influences the quality of the prediction. On the other hand, the extraction of features from the text can be very labor-intensive, although recent advances in sentence embedding have shown that this process can be automated. In this paper, we use a Genetic Programming algorithm to evolve kernels for Gaussian Processes to predict post-editing effort based on sentence embeddings. We show that the combination of evolutionary optimization and Gaussian Processes removes the need for a-priori specification of the kernel choice, and, by using a multi-objective variant of the Genetic Programming approach, kernels that are suitable for predicting several metrics can be learned. We also investigate the effect that the choice of the sentence embedding method has on the kernel learning process.

[1]  Laura Diosan,et al.  Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters , 2010, Applied Intelligence.

[2]  Neil D. Lawrence,et al.  Gaussian Processes for Natural Language Processing , 2014, ACL.

[3]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[4]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[5]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[6]  Roberto Santana,et al.  Evolving Gaussian Process Kernels for Translation Editing Effort Estimation , 2019, LION.

[7]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[8]  Lucia Specia,et al.  An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[9]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[10]  Lucia Specia,et al.  Learning Structural Kernels for Natural Language Processing , 2015, TACL.

[11]  Jose A. Lozano,et al.  Evolving Gaussian Process kernels from elementary mathematical expressions , 2019, ArXiv.

[12]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[15]  Michael G. Madden,et al.  An Evolutionary Approach to Automatic Kernel Construction , 2006, ICANN.

[16]  Lucia Specia,et al.  Exploring Prediction Uncertainty in Machine Translation Quality Estimation , 2016, CoNLL.

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  Martin A. Riedmiller,et al.  Optimization of Gaussian process hyperparameters using Rprop , 2013, ESANN.

[19]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[20]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[21]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[22]  Lucia Specia,et al.  QuEst - A translation quality estimation framework , 2013, ACL.

[23]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Ingemar J. Cox,et al.  Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance , 2017, WWW.

[27]  Bernd Bischl,et al.  Tuning and evolution of support vector kernels , 2012, Evol. Intell..

[28]  Lucia Specia,et al.  Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation , 2013, ACL.

[29]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[30]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[31]  Gabriel Kronberger,et al.  Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming , 2013, EUROCAST.

[32]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[33]  Philipp Koehn,et al.  Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.

[34]  Mark Fishel,et al.  Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings , 2019, WMT.

[35]  Daniel Beck Modelling Representation Noise in Emotion Analysis using Gaussian Processes , 2017, IJCNLP.

[36]  Lucia Specia,et al.  Exploiting Objective Annotations for Measuring Translation Post-editing Effort , 2011 .

[37]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[38]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[39]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[40]  Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection , 2006, PPSN.

[41]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[42]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[43]  Lourdes Araujo,et al.  How evolutionary algorithms are applied to statistical natural language processing , 2007, Artificial Intelligence Review.

[44]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[45]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[46]  Roberto Santana,et al.  Sentiment analysis with genetically evolved gaussian kernels , 2019, GECCO.

[47]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[48]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[49]  Sean Luke,et al.  Evolving kernels for support vector machine classification , 2007, GECCO '07.

[50]  Thomas Hofmann,et al.  Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification , 2017, WWW.

[51]  William Cohen Machine Learning for Information Management: Some Promising Directions , 2007, ICMLA 2007.

[52]  Simon Rogers,et al.  Protein interaction detection in sentences via Gaussian Processes: a preliminary evaluation , 2011, Int. J. Data Min. Bioinform..