Approximate nearest neighbor regression in very high dimensions

Fast and approximate nearest-neighbor search methods have recently become popular for scaling nonparameteric regression to more complex and high-dimensional applications. As an alternative to fast nearest neighbor search, training data can also be incorporated online into appropriate sufficient statistics and adaptive data structures, such that approximate nearestneighbor predictions can be accelerated by orders of magnitude by means of exploiting the compact representations of these sufficient statistics. This chapter describes such an approach for locally weighted regression with locally linear models. Initially, we focus on local dimensionality reduction techniques in order to scale locally weighted learning to domains with very high dimensional input data. The key issue here revolves around obtaining a statistically robust and computationally inexpensive estimation of local linear models in such large spaces, despite potential irrelevant and redundant inputs. We develop a local version of partial least squares regression that fulfills all of these requirements, and embed it in an incremental nonlinear regression algorithm that can be shown to work efficiently in a number of complex applications. In the second part of the chapter, we introduce a novel Bayesian formulation of partial least squares regression that converts our nonparametric regression approach to a probabilistic formulation. Some of the heuristic components inherent in partial least squares can be eliminated with this new algorithm by means of efficient Bayesian regularization techniques. Evaluations are provided for all algorithms on various synthetic data sets and real-time learning examples with anthropomorphic robots and complex simulations.

[1]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[2]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[3]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[4]  Brian Everitt,et al.  An Introduction to Latent Variable Models , 1984 .

[5]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[6]  R. H. Myers Classical and modern regression with applications , 1986 .

[7]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[8]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[9]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[10]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[11]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[12]  Frank L. Lewis,et al.  Aircraft Control and Simulation , 1992 .

[13]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[14]  Stefan Schaal,et al.  Assessing the Quality of Learned Local Models , 1993, NIPS.

[15]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[16]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[17]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[18]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[19]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[20]  Stefan Schaal,et al.  Local Dimensionality Reduction , 1997, NIPS.

[21]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[22]  R. Tibshirani,et al.  Bayesian Backfitting , 1998 .

[23]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[24]  Alexander J. Smola,et al.  Support Vector Machine Reference Manual , 1998 .

[25]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[26]  Hidemitsu Ogawa,et al.  RKHS-based functional analysis for exact incremental learning , 1999, Neurocomputing.

[27]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[28]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[30]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[31]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[32]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[33]  Stefan Schaal,et al.  Real-time robot learning with locally weighted statistical learning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[34]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[35]  S. Schaal,et al.  Origins and violations of the 2/3 power law in rhythmic three-dimensional arm movements , 2000, Experimental Brain Research.

[36]  Stefan Schaal,et al.  Are internal models of the entire body learnable , 2001 .

[37]  Dagmar Sternad,et al.  Origins and Violations of the 2/3 Power Law in Rhythmic 3D Arm Movements , 2001 .

[38]  Ben J. A. Kröse,et al.  Supervised Dimension Reduction of Intrinsically Low-Dimensional Data , 2002, Neural Computation.

[39]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[40]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[41]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[42]  Jun Nakanishi,et al.  Learning composite adaptive control for a class of nonlinear systems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[43]  Stefan Schaal,et al.  The Bayesian backfitting relevance vector machine , 2004, ICML.

[44]  Stefan Schaal,et al.  Local Adaptive Subspace Regression , 1998, Neural Processing Letters.

[45]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .