Surrogate-assisted multi-objective model selection for support vector machines

Classification is one of the most well-known tasks in supervised learning. A vast number of algorithms for pattern classification have been proposed so far. Among these, support vector machines (SVMs) are one of the most popular approaches, due to the high performance reached by these methods in a wide number of pattern recognition applications. Nevertheless, the effectiveness of SVMs highly depends on their hyper-parameters. Besides the fine-tuning of their hyper-parameters, the way in which the features are scaled as well as the presence of non-relevant features could affect their generalization performance. This paper introduces an approach for addressing model selection for support vector machines used in classification tasks. In our formulation, a model can be composed of feature selection and pre-processing methods besides the SVM classifier. We formulate the model selection problem as a multi-objective one, aiming to minimize simultaneously two components that are closely related to the error of a model: bias and variance components, which are estimated in an experimental fashion. A surrogate-assisted evolutionary multi-objective optimization approach is adopted to explore the hyper-parameters space. We adopted this approach due to the fact that estimating the bias and variance could be computationally expensive. Therefore, by using surrogate-assisted optimization, we expect to reduce the number of solutions evaluated by the fitness functions so that the computational cost would also be reduced. Experimental results conducted on benchmark datasets widely used in the literature, indicate that highly competitive models with a fewer number of fitness function evaluations are obtained by our proposal when it is compared to state of the art model selection methods.

[1]  Giorgio Valentini,et al.  An experimental bias-variance analysis of SVM ensembles based on resampling techniques , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-objective optimization and Meta-learning for SVM parameter selection , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Kevin P. Murphy,et al.  An experimental investigation of model-based parameter optimisation: SPO and beyond , 2009, GECCO.

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[8]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[9]  Weiguo Gong,et al.  Multi-objective uniform design as a SVM model selection tool for face recognition , 2011, Expert Syst. Appl..

[10]  T. J. Mitchell,et al.  Exploratory designs for computational experiments , 1995 .

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining Meta-learning and Search Techniques to SVM Parameter Selection , 2010, 2010 Eleventh Brazilian Symposium on Neural Networks.

[13]  Zhongyi Hu,et al.  A PSO and pattern search based memetic algorithm for SVMs parameters optimization , 2013, Neurocomputing.

[14]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[15]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining Meta-Learning with Multi-objective Particle Swarm Algorithms for SVM Parameter Selection: An Experimental Analysis , 2012, 2012 Brazilian Symposium on Neural Networks.

[16]  Geoffrey I. Webb MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[17]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[18]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[19]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[20]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.

[21]  Ching Y. Suen,et al.  Automatic model selection for the optimization of SVM kernels , 2005, Pattern Recognit..

[22]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[23]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[24]  Filip De Turck,et al.  Evolutionary Model Type Selection for Global Surrogate Modeling , 2009, J. Mach. Learn. Res..

[25]  Xinjie Yu Introduction to evolutionary algorithms , 2010, The 40th International Conference on Computers & Indutrial Engineering.

[26]  Andreas Dengel,et al.  Meta-learning for evolutionary parameter optimization of classifiers , 2012, Machine Learning.

[27]  Mehmet Karaköse,et al.  A multi-objective artificial immune algorithm for parameter optimization in support vector machine , 2011, Appl. Soft Comput..

[28]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[29]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[30]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[31]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[32]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[33]  Hugo Jair Escalante,et al.  Bias and Variance Multi-objective Optimization for Support Vector Machines Model Selection , 2013, IbPRIA.

[34]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Evolutionary tuning of SVM parameter values in multiclass problems , 2008, Neurocomputing.

[35]  Hugo Jair Escalante,et al.  Particle Swarm Model Selection , 2009, J. Mach. Learn. Res..

[36]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[37]  M. D. McKay,et al.  A comparison of three methods for selecting values of input variables in the analysis of output from a computer code , 2000 .

[38]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[39]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[40]  Christian Igel,et al.  Multi-Objective Optimization of Support Vector Machines , 2006, Multi-Objective Machine Learning.

[41]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[42]  Shiliang Sun,et al.  Feature selection for ensembles using Non-dominated Sorting in Genetic Algorithms , 2010, 2010 Sixth International Conference on Natural Computation.

[43]  Hugo Jair Escalante,et al.  A hybrid surrogate-based approach for evolutionary multi-objective optimization , 2013, 2013 IEEE Congress on Evolutionary Computation.

[44]  Yves Lecourtier,et al.  A multi-model selection framework for unknown and/or evolutive misclassification cost problems , 2010, Pattern Recognit..

[45]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[46]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.