Receptive Field Weighted Regression

Abstract We introduce a constructive, incremental learning system for regression problems that models data by means of spatially localized linear models. In contrast to other approaches, the size and shape of the receptive field of each locally linear model as well as the parameters of the locally linear model itself are learned independently, i.e., without the need for competition or any other kind of communication. This characteristic is accomplished by incrementally minimizing a weighted penalized local cross validation error. As a result, we obtain a learning system that can allocate resources as needed while dealing with the bias-variance dilemma in a principled way. The spatial localization of the linear models increases robustness towards negative interference. Our learning system can be interpreted as a nonparametric adaptive bandwidth smoother, as a mixture of experts where the experts are trained in isolation, and as a learning system which profits from combining independent expert knowledge on the same problem. It illustrates the potential learning capabilities of purely local learning and offers an interesting and powerful approach to learning with receptive fields.

[1]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[2]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[3]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[4]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[5]  D. Sparks,et al.  Population coding of saccadic eye movements by neurons in the superior colliculus , 1988, Nature.

[6]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[7]  Mark J. L. Orr,et al.  Regularization in the Selection of Radial Basis Function Centers , 1995, Neural Computation.

[8]  John Daugman,et al.  Gabor wavelets for statistical pattern recognition , 1998 .

[9]  Stefan Schaal,et al.  Memory-based neural networks for robot learning , 1995, Neurocomputing.

[10]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[11]  J. Hájek A course in nonparametric statistics , 1969 .

[12]  T. Hastie,et al.  Local Regression: Automatic Kernel Carpentry , 1993 .

[13]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[14]  M. Kawato,et al.  Internal representations of the motor apparatus: implications from generalization in visuomotor learning. , 1995, Journal of experimental psychology. Human perception and performance.

[15]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[16]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[17]  Terence D. Sanger,et al.  A tree-structured adaptive network for function approximation in high-dimensional spaces , 1991, IEEE Trans. Neural Networks.

[18]  R. J. Tibshirani,et al.  Nonparametric Regression and Classification Part I—Nonparametric Regression , 1994 .

[19]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[20]  A. Georgopoulos Higher order motor control. , 1991, Annual review of neuroscience.

[21]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[22]  Ferdinando A. Mussa-Ivaldi,et al.  Interference in Learning Internal Models of Inverse Dynamics in Humans , 1994, NIPS.

[23]  Andrew W. Moore,et al.  Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.

[24]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[25]  Gerald Sommer,et al.  Dynamic Cell Structure Learns Perfectly Topology Preserving Map , 1995, Neural Computation.

[26]  Jean-Jacques E. Slotine,et al.  Space-frequency localized basis function networks for nonlinear system estimation and control , 1995, Neurocomputing.

[27]  Stefan Schaal,et al.  Assessing the Quality of Learned Local Models , 1993, NIPS.

[28]  J. Friedman A VARIABLE SPAN SMOOTHER , 1984 .

[29]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[30]  J. Simonoff Multivariate Density Estimation , 1996 .

[31]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[32]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33]  D. Yeung,et al.  Constructive feedforward neural networks for regression problems : a survey , 1995 .

[34]  V. Mountcastle Modality and topographic properties of single neurons of cat's somatic sensory cortex. , 1957, Journal of neurophysiology.

[35]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[36]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[37]  Helge Ritter,et al.  Topology conserving mappings for learning motor tasks , 1987 .

[38]  Jianqing Fan,et al.  Variable Bandwidth and Local Linear Regression Smoothers , 1992 .

[39]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[40]  Peter J. Millington,et al.  Associative reinforcement learning for optimal control , 1991 .

[41]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[42]  T. Sejnowski,et al.  Irresistible environment meets immovable neurons , 1997, Behavioral and Brain Sciences.

[43]  C. Atkeson,et al.  Learning arm kinematics and dynamics. , 1989, Annual review of neuroscience.

[44]  E. Nadaraya On Estimating Regression , 1964 .

[45]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[46]  Joydeep Ghosh,et al.  Ridge polynomial networks , 1995, IEEE Trans. Neural Networks.

[47]  Cesare Furlanello,et al.  Connectionist Speaker Normalization with Generalized Resource Allocating Networks , 1994, NIPS.

[48]  G. Wahba Spline models for observational data , 1990 .

[49]  Stefan Schaal,et al.  Local dimensionality reduction for locally weighted learning , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[50]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[51]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[52]  E. Littmann Generalization Abilities of Cascade Network Architectures , 1992 .

[53]  Christopher G. Atkeson,et al.  Using Local Models to Control Movement , 1989, NIPS.

[54]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[55]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[56]  C. Furlanello,et al.  Combining local PCA and radial basis function networks for speaker normalization , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[57]  Terrence J. Sejnowski,et al.  The Computational Brain , 1996, Artif. Intell..

[58]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[59]  Stefan Schaal,et al.  From Isolation to Cooperation: An Alternative View of a System of Experts , 1995, NIPS.

[60]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[61]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[62]  D. J. Felleman,et al.  Topographic reorganization of somatosensory cortical areas 3b and 1 in adult monkeys following restricted deafferentation , 1983, Neuroscience.

[63]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[64]  Jianqing Fan,et al.  Data‐Driven Bandwidth Selection in Local Polynomial Fitting: Variable Bandwidth and Spatial Adaptation , 1995 .

[65]  R. H. Myers Classical and modern regression with applications , 1986 .

[66]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[67]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[68]  Patrick van der Smagt,et al.  Approximation with neural networks: between local and global approximation , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[69]  Richard S. Sutton,et al.  Iterative Construction of Sparse Polynomial Approximations , 1991, NIPS.

[70]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[71]  F A Mussa-Ivaldi,et al.  Adaptive representation of dynamics during learning of a motor task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[72]  W. Cleveland,et al.  Regression by local fitting: Methods, properties, and computational algorithms , 1988 .

[73]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[74]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[75]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.