Randomized Nonlinear Component Analysis

Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have been proposed to construct features that help reveal nonlinear patterns in data. For basic tasks such as regression or classification, random features exhibit little or no loss in performance, while achieving drastic savings in computational requirements. In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. We demonstrate our algorithms through experiments on real-world data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[3]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[4]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[5]  Michael W. Mahoney Boyd,et al.  Randomized Algorithms for Matrices and Data , 2010 .

[6]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[7]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[8]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[11]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[12]  Lothar Reichel,et al.  Restarted block Lanczos bidiagonalization methods , 2007, Numerical Algorithms.

[13]  Michael I. Jordan,et al.  Matrix concentration inequalities via the method of exchangeable pairs , 2012, 1201.6002.

[14]  Joachim M. Buhmann,et al.  Correlated random features for fast semi-supervised learning , 2013, NIPS.

[15]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[16]  Vikas Sindhwani,et al.  Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[17]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[18]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[19]  Raman Arora,et al.  Kernel CCA for multi-view learning of acoustic features using articulatory measurements , 2012, MLSLP.

[20]  Dennis DeCoste,et al.  Compact Random Feature Maps , 2013, ICML.

[21]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[22]  Christos Boutsidis,et al.  Efficient Dimensionality Reduction for Canonical Correlation Analysis , 2012, SIAM J. Sci. Comput..

[23]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[24]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[25]  AI Koan Weighted Sums of Random Kitchen Sinks : Replacing minimization with randomization in learning , 2008 .

[26]  Bernhard Schölkopf,et al.  Sampling Techniques for Kernel Methods , 2001, NIPS.

[27]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[29]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[30]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[31]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[32]  J. Tropp User-Friendly Tools for Random Matrices: An Introduction , 2012 .

[33]  Rong Jin,et al.  Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[34]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[35]  Tijl De Bie,et al.  Eigenproblems in Pattern Recognition , 2005 .