论文信息 - Local Dimensionality Reduction for Non-Parametric Regression

Local Dimensionality Reduction for Non-Parametric Regression

Locally-weighted regression is a computationally-efficient technique for non-linear regression. However, for high-dimensional data, this technique becomes numerically brittle and computationally too expensive if many local models need to be maintained simultaneously. Thus, local linear dimensionality reduction combined with locally-weighted regression seems to be a promising solution. In this context, we review linear dimensionality-reduction methods, compare their performance on non-parametric locally-linear regression, and discuss their ability to extend to incremental learning. The considered methods belong to the following three groups: (1) reducing dimensionality only on the input data, (2) modeling the joint input-output data distribution, and (3) optimizing the correlation between projection directions and output data. Group 1 contains principal component regression (PCR); group 2 contains principal component analysis (PCA) in joint input and output space, factor analysis, and probabilistic PCA; and group 3 contains reduced rank regression (RRR) and partial least squares (PLS) regression. Among the tested methods, only group 3 managed to achieve robust performance even for a non-optimal number of components (factors or projection directions). In contrast, group 1 and 2 failed for fewer components since these methods rely on the correct estimate of the true intrinsic dimensionality. In group 3, PLS is the only method for which a computationally-efficient incremental implementation exists. Thus, PLS appears to be ideally suited as a building block for a locally-weighted regressor in which projection directions are incrementally added on the fly.

[1] G. Matheron. Principles of geostatistics , 1963 .

[2] E. M. Wright,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[3] J. T. Webster,et al. Latent Root Regression Analysis , 1974 .

[4] A. Izenman. Reduced-rank regression for the multivariate linear model , 1975 .

[5] A. L. V. D. Wollenberg. Redundancy analysis an alternative for canonical correlation analysis , 1977 .

[6] E. Oja. Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[7] S. Wold,et al. The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[8] Brian Everitt,et al. An Introduction to Latent Variable Models , 1984 .

[9] Erkki Oja,et al. Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[10] Terence D. Sanger,et al. Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[11] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[12] Noel A Cressie,et al. Statistics for Spatial Data. , 1992 .

[13] J. Friedman,et al. A Statistical View of Some Chemometrics Regression Tools , 1993 .

[14] S. D. Jong. SIMPLS: an alternative approach to partial least squares regression , 1993 .

[15] Mike Rees,et al. 5. Statistics for Spatial Data , 1993 .

[16] Noel A. C. Cressie,et al. Statistics for Spatial Data: Cressie/Statistics , 1993 .

[17] Javier R. Movellan,et al. Learning Continuous Probability Distributions with Symmetric Diffusion Networks , 1993, Cogn. Sci..

[18] Geoffrey E. Hinton,et al. The EM algorithm for mixtures of factor analyzers , 1996 .

[19] Sun-Yuan Kung,et al. Principal Component Neural Networks: Theory and Applications , 1996 .

[20] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[21] J. Geweke,et al. Bayesian reduced rank regression in econometrics , 1996 .

[22] Stefan Schaal,et al. Local Dimensionality Reduction , 1997, NIPS.

[23] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[24] Vladimir Cherkassky,et al. The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[25] Alexander J. Smola,et al. Learning with kernels , 1998 .

[26] H. Müller,et al. Local Polynomial Modeling and Its Applications , 1998 .

[27] Mitsuo Kawato,et al. Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[28] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[29] Zoubin Ghahramani,et al. Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[30] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[31] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[32] Stefan Schaal,et al. Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[33] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[35] Stefan Schaal,et al. Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[36] Stefan Schaal,et al. Fast and efficient incremental learning for high-dimensional movement systems , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[37] S. Schaal,et al. Origins and violations of the 2/3 power law in rhythmic three-dimensional arm movements , 2000, Experimental Brain Research.

[38] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[39] Zheng Bao,et al. Robust recursive least squares learning algorithm for principal component analysis , 2000, IEEE Trans. Neural Networks Learn. Syst..

[40] Stefan Schaal,et al. Are internal models of the entire body learnable , 2001 .

[41] Dagmar Sternad,et al. Origins and Violations of the 2/3 Power Law in Rhythmic 3D Arm Movements , 2001 .

[42] George Eastman House,et al. Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[43] Ralf Möller. Interlocking of learning and orthonormalization in RRLSA , 2002, Neurocomputing.

[44] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.

[45] Ben J. A. Kröse,et al. Supervised Dimension Reduction of Intrinsically Low-Dimensional Data , 2002, Neural Computation.

[46] Stefan Schaal,et al. Statistical Learning for Humanoid Robots , 2002, Auton. Robots.

[47] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[48] Heiko Hoffmann,et al. Unsupervised Learning of a Kinematic Arm Model , 2003, ICANN.

[49] Kilian Q. Weinberger,et al. Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[50] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[51] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.

[52] Stefan Schaal,et al. Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[53] Heiko Hoffmann,et al. Unsupervised learning of visuomotor associations , 2005 .

[54] Juha Karhunen,et al. Principal component neural networks — Theory and applications , 1998, Pattern Analysis and Applications.

[55] Bovas Abraham,et al. Dimensionality reduction approach to multivariate prediction , 2005, Comput. Stat. Data Anal..

[56] Wolfram Schenck,et al. Learning visuomotor transformations for gaze-control and grasping , 2005, Biological Cybernetics.

[57] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[58] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[59] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.