In-depth analysis of SVM kernel learning and its components

The performance of support vector machines in nonlinearly separable classification problems strongly relies on the kernel function. Toward an automatic machine learning approach for this technique, many research outputs have been produced dealing with the challenge of automatic learning of good-performing kernels for support vector machines. However, these works have been carried out without a thorough analysis of the set of components that influence the behavior of support vector machines and their interaction with the kernel. These components are related in an intricate way and it is difficult to provide a comprehensible analysis of their joint effect. In this paper, we try to fill this gap introducing the necessary steps in order to understand these interactions and provide clues for the research community to know where to place the emphasis. First of all, we identify all the factors that affect the final performance of support vector machines in relation to the elicitation of kernels. Next, we analyze the factors independently or in pairs and study the influence each component has on the final classification performance, providing recommendations and insights into the kernel setting for support vector machines.

[1]  Maysam F. Abbod,et al.  Genetic folding for solving multiclass SVM problems , 2014, Applied Intelligence.

[2]  Fuzhen Zhang Positive Semidefinite Matrices , 2011 .

[3]  Laura Diosan,et al.  Optimising Multiple Kernels for SVM by Genetic Programming , 2008, EvoCOP.

[4]  Mohamed Mohandes,et al.  Support vector machines for wind speed prediction , 2004 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Michael G. Madden,et al.  An Evolutionary Approach to Automatic Kernel Construction , 2006, ICANN.

[7]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[8]  Sean Luke,et al.  Evolving kernels for support vector machine classification , 2007, GECCO '07.

[9]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[11]  William Cohen Machine Learning for Information Management: Some Promising Directions , 2007, ICMLA 2007.

[12]  Chin-Teng Lin,et al.  An automatic method for selecting the parameter of the RBF kernel function to support vector machines , 2010, 2010 IEEE International Geoscience and Remote Sensing Symposium.

[13]  Liquan Zhao,et al.  Classification of Multiple Power Quality Disturbances Based on PSO-SVM of Hybrid Kernel Function , 2019, J. Inf. Hiding Multim. Signal Process..

[14]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[15]  Paul Lukowicz,et al.  On general purpose time series similarity measures and their use as kernel functions in support vector machines , 2014, Inf. Sci..

[16]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[17]  Muhammad Hussain,et al.  A Comparison of SVM Kernel Functions for Breast Cancer Detection , 2011, 2011 Eighth International Conference Computer Graphics, Imaging and Visualization.

[18]  John W. Sheppard,et al.  Evolving Kernel Functions with Particle Swarms and Genetic Programming , 2012, FLAIRS.

[19]  Laura Diosan,et al.  Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters , 2010, Applied Intelligence.

[20]  Yan Pei Automatic Decision Making for Parameters in Kernel Method , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[21]  Jeng-Shyang Pan,et al.  Kernel Learning Algorithms for Face Recognition , 2013 .

[22]  Giorgio Metta,et al.  Evolutionary Optimization of Least-Squares Support Vector Machines , 2010, Data Mining.

[23]  Ashutosh,et al.  Evolutionary Selection of Kernels in Support Vector Machines , 2006, 2006 International Conference on Advanced Computing and Communications.

[24]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[26]  Michael G. Madden,et al.  The Genetic Kernel Support Vector Machine: Description and Evaluation , 2005, Artificial Intelligence Review.

[27]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[28]  Ricardo Vilalta,et al.  Kernel Selection in Support Vector Machines Using Gram-Matrix Properties , 2014 .

[29]  Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection , 2006, PPSN.

[30]  Bernd Bischl,et al.  Tuning and evolution of support vector kernels , 2012, Evol. Intell..

[31]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[32]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, ICMLA 2007.

[33]  Mohammad Mehdi Ebadzadeh,et al.  Kernel evolution for support vector classification , 2011, 2011 IEEE Workshop on Evolving and Adaptive Intelligent Systems (EAIS).

[34]  Boonserm Kijsirikul,et al.  GPES: An algorithm for evolving hybrid kernel functions of Support Vector Machines , 2007, 2007 IEEE Congress on Evolutionary Computation.

[35]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[36]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[37]  Olivier Chapelle,et al.  Support Vector Machines: Induction Principle, Adaptive Tuning and Prior Knowledge , 2002 .

[38]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[39]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[40]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[43]  David J. Crisp,et al.  Uniqueness of the SVM Solution , 1999, NIPS.

[44]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[45]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[46]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[47]  Ana Carolina Lorena,et al.  GEEK: Grammatical Evolution for Automatically Evolving Kernel Functions , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.

[48]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[49]  Bernhard Sick,et al.  The responsibility weighted Mahalanobis kernel for semi-supervised training of support vector machines for classification , 2015, Inf. Sci..

[50]  Marc Peter Deisenroth,et al.  Analytic Long-Term Forecasting with Periodic Gaussian Processes , 2014, AISTATS.

[51]  Liviu Ciortuz,et al.  A Hybrid Genetic Programming and Boosting Technique for Learning Kernel Functions from Training Data , 2007, Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2007).

[52]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[53]  D. Ginsbourger,et al.  Additive Covariance Kernels for High-Dimensional Gaussian Process Modeling , 2011, 1111.6233.

[54]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[55]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .