Kernel logistic PLS: A tool for supervised nonlinear dimensionality reduction and binary classification

''Kernel logistic PLS'' (KL-PLS) is a new tool for supervised nonlinear dimensionality reduction and binary classification. The principles of KL-PLS are based on both PLS latent variables construction and learning with kernels. The KL-PLS algorithm can be seen as a supervised dimensionality reduction (complexity control step) followed by a classification based on logistic regression. The algorithm is applied to 11 benchmark data sets for binary classification and to three medical problems. In all cases, KL-PLS proved its competitiveness with other state-of-the-art classification methods such as support vector machines. Moreover, due to successions of regressions and logistic regressions carried out on only a small number of uncorrelated variables, KL-PLS allows handling high-dimensional data. The proposed approach is simple and easy to implement. It provides an efficient complexity control by dimensionality reduction and allows the visual inspection of data segmentation.

[1]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[2]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[3]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[4]  A. Höskuldsson PLS regression methods , 1988 .

[5]  Li Shen,et al.  Dimension reduction-based penalized logistic regression for cancer classification using microarray data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[7]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[8]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[9]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[10]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[11]  Andrew R. Webb Nonlinear feature extraction with radial basis functions using a weighted multidimensional scaling stress measure , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Kristin P. Bennett,et al.  An Optimization Perspective on Kernel Partial Least Squares Regression , 2003 .

[13]  Desire L. Massart,et al.  Kernel-PCA algorithms for wide data Part II: Fast cross-validation and application in classification of NIR data , 1997 .

[14]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[15]  A. J. Morris,et al.  Non-linear projection to latent structures revisited (the neural network PLS algorithm) , 1999 .

[16]  S. Wold,et al.  Nonlinear PLS modeling , 1989 .

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Paul D. Allison,et al.  Logistic Regression Using the SAS System : Theory and Application , 1999 .

[20]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[21]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[22]  Michel Tenenhaus,et al.  PLS generalised linear regression , 2005, Comput. Stat. Data Anal..

[23]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[24]  L. Tucker An inter-battery method of factor analysis , 1958 .

[25]  J. Hanley Receiver operating characteristic (ROC) methodology: the state of the art. , 1989, Critical reviews in diagnostic imaging.

[26]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[27]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[28]  Gilbert Saporta,et al.  Modèles statistiques pour données qualitatives , 2005 .

[29]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[30]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[31]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[34]  S. Wold Nonlinear partial least squares modelling II. Spline inner relation , 1992 .

[35]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[36]  ScienceDirect Computational statistics & data analysis , 1983 .

[37]  A. Tenenhaus,et al.  Kernel logistic PLS: a new tool for complex classification , 2005 .

[38]  Roman Rosipal,et al.  Kernel PLS-SVC for Linear and Nonlinear Classification , 2003, ICML.

[39]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.