The Minimum Redundancy - Maximum Relevance Approach to Building Sparse Support Vector Machines

Recently, building sparse SVMs becomes an active research topic due to its potential applications in large scale data mining tasks. One of the most popular approaches to building sparse SVMs is to select a small subset of training samples and employ them as the support vectors. In this paper, we explain that selecting the support vectors is equivalent to selecting a number of columns from the kernel matrix, and is equivalent to selecting a subset of features in the feature selection domain. Hence, we propose to use an effective feature selection algorithm, namely the Minimum Redundancy -- Maximum Relevance (MRMR) algorithm to solve the support vector selection problem. MRMR algorithm was then compared to two existing methods, namely back-fitting (BF) and pre-fitting (PF) algorithms. Preliminary results showed that MRMR generally outperformed BF algorithm while it was inferior to PF algorithm, in terms of generalization performance. However, the MRMR approach was extremely efficient and significantly faster than the two compared algorithms.

[1]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[2]  Xin Yao,et al.  Greedy forward selection algorithms to Sparse Gaussian Process Regression , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[3]  Gunnar Rätsch,et al.  Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[5]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[6]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[7]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[8]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[9]  Tom Downs,et al.  Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..

[10]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[11]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[12]  Chih-Jen Lin,et al.  A study on reduced support vector machines , 2003, IEEE Trans. Neural Networks.

[13]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[14]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[16]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..