Fast Iterative Kernel Principal Component Analysis

We develop gain adaptation methods that improve convergence of the kernel Hebbian algorithm (KHA) for iterative kernel PCA (Kim et al., 2005). KHA has a scalar gain parameter which is either held constant or decreased according to a predetermined annealing schedule, leading to slow convergence. We accelerate it by incorporating the reciprocal of the current estimated eigenvalues as part of a gain vector. An additional normalization term then allows us to eliminate a tuning parameter in the annealing schedule. Finally we derive and apply stochastic meta-descent (SMD) gain vector adaptation (Schraudolph, 1999, 2002) in reproducing kernel Hilbert space to further speed up convergence. Experimental results on kernel PCA and spectral clustering of USPS digits, motion capture and image denoising, and image super-resolution tasks confirm that our methods converge substantially faster than conventional KHA. To demonstrate scalability, we perform kernel PCA on the entire MNIST data set.

[1]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[2]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  S. V. N. Vishwanathan,et al.  Fast Iterative Kernel PCA , 2006, NIPS.

[4]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[5]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[9]  H. Robbins A Stochastic Approximation Method , 1951 .

[10]  Shyang Chang,et al.  An adaptive learning algorithm for principal component analysis , 1995, IEEE Trans. Neural Networks.

[11]  David Suter,et al.  Human Motion De-noising via Greedy Kernel Principal Component Analysis Filtering , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[14]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[15]  J. Karhunen Optimization criteria and nonlinear PCA neural networks , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[16]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[17]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[18]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[19]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[20]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[21]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[23]  D. Munson A note on Lena , 1996 .

[24]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[25]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[26]  Alexander J. Smola,et al.  Step Size Adaptation in Reproducing Kernel Hilbert Space , 2006, J. Mach. Learn. Res..

[27]  Luca Zanni,et al.  Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems , 2006, J. Mach. Learn. Res..

[28]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.