Nonparametric Regression for Learning

In recent years, learning theory has been increas ingly influenced by the fact that many learning algorithms have at least in part a comprehensive interpretation in terms of well established statistical theories. Furthermore, with little modification, several statistical methods can be directly cast into learning algorithms. One family of such methods stems from nonparametric regression. This paper compares nonparametric learning with the more widely used parametric counterparts and investigates how these two families differ in their properties and their applicability.

[1]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[2]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[3]  Terrence J. Sejnowski,et al.  The Computational Brain , 1996, Artif. Intell..

[4]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[5]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[6]  Stefan Schaal,et al.  Robot learning by nonparametric regression , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[7]  A. Georgopoulos Higher order motor control. , 1991, Annual review of neuroscience.

[8]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[9]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[10]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[11]  StanfillCraig,et al.  Toward memory-based reasoning , 1986 .

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  E. Nadaraya On Estimating Regression , 1964 .

[14]  I. K Crain,et al.  Treatment of non-equispaced two-dimensional data with a digital computer , 1967 .

[15]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[16]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[17]  Jianqing Fan,et al.  Variable Bandwidth and Local Linear Regression Smoothers , 1992 .

[18]  S. Schaal,et al.  Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.

[19]  B. Yandell Spline smoothing and nonparametric regression , 1989 .

[20]  D Marr,et al.  Simple memory: a theory for archicortex. , 1971, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[21]  J. Donoghue,et al.  Long-term potentiation of horizontal connections provides a mechanism to reorganize cortical motor maps. , 1994, Journal of neurophysiology.

[22]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[23]  D. J. Felleman,et al.  Topographic reorganization of somatosensory cortical areas 3b and 1 in adult monkeys following restricted deafferentation , 1983, Neuroscience.

[24]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[25]  D. Sparks,et al.  Population coding of saccadic eye movements by neurons in the superior colliculus , 1988, Nature.

[26]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[27]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[28]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[29]  J. Simonoff Multivariate Density Estimation , 1996 .

[30]  Frederick Robertson Macaulay,et al.  The Smoothing of Time Series , 1931 .

[31]  Christopher G. Atkeson,et al.  What should be learned , 1992 .

[32]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[33]  Stefan Schaal,et al.  Assessing the Quality of Learned Local Models , 1993, NIPS.

[34]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[35]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[36]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[37]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[38]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[39]  J. Hájek A course in nonparametric statistics , 1969 .

[40]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[41]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[42]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[43]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[44]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[45]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[46]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[47]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[48]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[49]  Stefan Schaal,et al.  Learning passive motor control strategies with genetic algorithms , 1993 .

[50]  E. Littmann Generalization Abilities of Cascade Network Architectures , 1992 .

[51]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[52]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[53]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[54]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[55]  D. Sprecher On the structure of continuous functions of several variables , 1965 .

[56]  R. Tibshirani,et al.  Local Likelihood Estimation , 1987 .

[57]  R. H. Myers Classical and modern regression with applications , 1986 .

[58]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[59]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[60]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[61]  Stephen M. Omohundro,et al.  Efficient Algorithms with Neural Network Behavior , 1987, Complex Syst..

[62]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[63]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[64]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[65]  W. Cleveland,et al.  Regression by local fitting: Methods, properties, and computational algorithms , 1988 .

[66]  J. Doyne Farmer,et al.  Exploiting Chaos to Predict the Future and Reduce Noise , 1989 .