Nonparametric Regression for Learning Nonlinear Transformations

Information processing in animals and artificial movement systems consists of a series of transformations that map sensory signals to intermediate representations, and finally to motor commands. Given the physical and neu-roanatomical differences between individuals and the need for plasticity during development, it is highly likely that such transformations are learned rather than pre-programmed by evolution. Such self-organizing processes, capable of discovering nonlinear dependencies between different groups of signals, are one essential part of prerational intelligence. While neural network algorithms seem to be the natural choice when searching for solutions for learning transformations, this paper will take a more careful look at which types of neural networks are actually suited for the requirements of an autonomous learning system. The approach that we will pursue is guided by recent developments in learning theory that have linked neural network learning to well established statistical theories. In particular, this new statistical understanding has given rise to the development of neural network systems that are directly based on statistical methods. One family of such methods stems from nonparametric regression. This article will compare nonparametric learning with the more widely used parametric counterparts in a non-technical fashion, and investigate how these two families differ in their properties and their applicabilities.

[1]  Catherine W. M. Sherriff,et al.  XIII.—On a Class of Graduation Formulæ , 2022 .

[2]  Frederick Robertson Macaulay,et al.  The Smoothing of Time Series , 1931 .

[3]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[4]  E. Nadaraya On Estimating Regression , 1964 .

[5]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[6]  D. Sprecher On the structure of continuous functions of several variables , 1965 .

[7]  I. K Crain,et al.  Treatment of non-equispaced two-dimensional data with a digital computer , 1967 .

[8]  Edmund Taylor Whittaker,et al.  The Calculus of Observations , 1969, The Mathematical Gazette.

[9]  J. Hájek A course in nonparametric statistics , 1969 .

[10]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[11]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[14]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[15]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[16]  James S. Albus,et al.  Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .

[17]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[18]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[21]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[22]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[23]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[25]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[26]  D. J. Felleman,et al.  Topographic reorganization of somatosensory cortical areas 3b and 1 in adult monkeys following restricted deafferentation , 1983, Neuroscience.

[27]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[28]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[29]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[30]  R. H. Myers Classical and modern regression with applications , 1986 .

[31]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[32]  R. Tibshirani,et al.  Local Likelihood Estimation , 1987 .

[33]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[34]  W. Cleveland,et al.  Regression by local fitting: Methods, properties, and computational algorithms , 1988 .

[35]  H. Müller Nonparametric regression analysis of longitudinal data , 1988 .

[36]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[37]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[38]  D. Sparks,et al.  Population coding of saccadic eye movements by neurons in the superior colliculus , 1988, Nature.

[39]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[40]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .

[41]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[42]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[43]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[44]  M. C. Jones,et al.  Spline Smoothing and Nonparametric Regression. , 1989 .

[45]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[46]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[47]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[48]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[49]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[50]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[51]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[52]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[53]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[54]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[55]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[56]  A. Georgopoulos Higher order motor control. , 1991, Annual review of neuroscience.

[57]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[58]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[59]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[60]  Jianqing Fan,et al.  Variable Bandwidth and Local Linear Regression Smoothers , 1992 .

[61]  E. Littmann Generalization Abilities of Cascade Network Architectures , 1992 .

[62]  Helge J. Ritter,et al.  Generalization Abilities of Cascade Network Architecture , 1992, NIPS.

[63]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[64]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[65]  Christopher G. Atkeson,et al.  What should be learned , 1992 .

[66]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[67]  Stefan Schaal,et al.  Learning passive motor control strategies with genetic algorithms , 1993 .

[68]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[69]  Stefan Schaal,et al.  Assessing the Quality of Learned Local Models , 1993, NIPS.

[70]  C. Gielen,et al.  Neural computation and self-organizing maps, an introduction , 1993 .

[71]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[72]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[73]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[74]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[75]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[76]  Stefan Schaal,et al.  Robot learning by nonparametric regression , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[77]  J. Donoghue,et al.  Long-term potentiation of horizontal connections provides a mechanism to reorganize cortical motor maps. , 1994, Journal of neurophysiology.

[78]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[79]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[80]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[81]  Stefan Schaal,et al.  From Isolation to Cooperation: An Alternative View of a System of Experts , 1995, NIPS.

[82]  J. Simonoff Multivariate Density Estimation , 1996 .

[83]  Terrence J. Sejnowski,et al.  The Computational Brain , 1996, Artif. Intell..

[84]  Stefan Schaal,et al.  Local dimensionality reduction for locally weighted learning , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[85]  T. Sejnowski,et al.  Irresistible environment meets immovable neurons , 1997, Behavioral and Brain Sciences.

[86]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[87]  S. Schultz Principles of Neural Science, 4th ed. , 2001 .