Parallel and multi-objective EDAs to create multivariate calibration models for quantitative chemical applications

This paper describes the application of several data mining approaches to solve a calibration problem in a quantitative chemistry environment. Experimental data obtained from reactions which involve known concentrations of two or more components are used to calibrate a model that, later, will be used to predict the (unknown) concentrations of those components in a new reaction. This problem can be seen as a selection + prediction one, where the goal is to obtain good values for the variables to predict while minimizing the number of the input variables needed, taking a small subset of really significant ones. Initial approaches to the problem were principal components analysis and filtering. Then we used methods to make smarter reductions of variables, by means of parallel estimation of distribution algorithms (EDAs) to choose collections of variables that yield models with less average prediction errors. A final step was to use multiobjective parallel EDAs, in order to present a set of optimal solutions instead of a single solution.

[1]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[3]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[4]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[5]  BayesiannetworksPedro,et al.  Combinatorial optimization by learning and simulation of , 2000 .

[6]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[7]  P. Larra,et al.  Feature Subset Selection by Bayesian Networks Based Optimization Abstract|a New Method for Feature Subset Selection in Machine Learning, Fss-ebna , 1999 .

[8]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[9]  Pedro Larrañaga,et al.  Adjusting Weights in Artificial Neural Networks using Evolutionary Algorithms , 2002, Estimation of Distribution Algorithms.

[10]  Chandrika Kamath,et al.  Evolving neural networks to identify bent-double galaxies in the FIRST survey , 2003, Neural Networks.

[11]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[12]  A. A. Zhigli︠a︡vskiĭ,et al.  Theory of Global Random Search , 1991 .

[13]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[14]  David R. Butenhof Programming with POSIX threads , 1993 .

[15]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[16]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[19]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[20]  Ron Kohavi Feature Subset Selection as Search with Probabilistic Estimates , 1994 .

[21]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[22]  Erick Cantú-Paz,et al.  Feature Subset Selection by Estimation of Distribution Algorithms , 2002, GECCO.

[23]  Erick Cantú-Paz,et al.  Feature Subset Selection, Class Separability, and Genetic Algorithms , 2004, GECCO.