Fast Iterative Kernel PCA

We introduce two methods to improve convergence of the Kernel Hebbian Algorithm (KHA) for iterative kernel PCA. KHA has a scalar gain parameter which is either held constant or decreased as 1/t, leading to slow convergence. Our KHA/et algorithm accelerates KHA by incorporating the reciprocal of the current estimated eigenvalues as a gain vector. We then derive and apply Stochastic Meta-Descent (SMD) to KHA/et; this further speeds convergence by performing gain adaptation in RKHS. Experimental results for kernel PCA and spectral clustering of USPS digits as well as motion capture and image de-noising problems confirm that our methods converge substantially faster than conventional KHA.

[1]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[2]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[4]  Shyang Chang,et al.  An adaptive learning algorithm for principal component analysis , 1995, IEEE Trans. Neural Networks.

[5]  D. Munson A note on Lena , 1996 .

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[11]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[12]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David Suter,et al.  Human Motion De-noising via Greedy Kernel Principal Component Analysis Filtering , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[14]  Alexander J. Smola,et al.  Step Size Adaptation in Reproducing Kernel Hilbert Space , 2006, J. Mach. Learn. Res..

[15]  H. Robbins A Stochastic Approximation Method , 2007 .