Neural Networks and the Bias/Variance Dilemma

Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallel-versus-serial hardware or other implementation issues. Furthermore, we suggest that the fundamental challenges in neural modeling are about representation rather than learning per se. This last point is supported by additional experiments with handwritten numerals.

[1]  U. Grenander On empirical spectral analysis of stochastic processes , 1952 .



[4]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[5]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[6]  David R. Cox The analysis of binary data , 1970 .

[7]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[12]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  G. Wahba Convergence rates of "thin plate" smoothing splines wihen the data are noisy , 1979 .

[16]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[17]  E. F. Schuster,et al.  On the Nonconsistency of Maximum Likelihood Nonparametric Density Estimators , 1981 .

[18]  David J. Burr,et al.  Elastic Matching of Line Drawings , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Grace Wahba,et al.  Constrained Regularization for Ill Posed Linear Operator Equations, with Applications in Meteorology and Medicine. , 1982 .

[20]  S. Geman,et al.  Nonparametric Maximum Likelihood Estimation by the Method of Sieves , 1982 .

[21]  E. Cook,et al.  A computer-derived protocol to aid in the diagnosis of emergency room patients with acute chest pain. , 1982, The New England journal of medicine.

[22]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[23]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .

[24]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[25]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[26]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[27]  Grace Wahba,et al.  A cross validated bayesian retrieval algorithm for nonlinear remote sensing experiments , 1985 .

[28]  C. Malsburg,et al.  Statistical Coding and Short-Term Synaptic Plasticity: A Scheme for Knowledge Representation in the Brain , 1986 .

[29]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[30]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[31]  C. von der Malsburg,et al.  Am I Thinking Assemblies , 1986 .

[32]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[33]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[34]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[35]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[36]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[37]  P. Carnevali,et al.  Exhaustive Thermodynamical Analysis of Boolean Learning Networks , 1987 .

[38]  E. Veklerov,et al.  Stopping Rule for the MLE Algorithm Based on Statistical Hypothesis Testing , 1987, IEEE Transactions on Medical Imaging.

[39]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[40]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[41]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[42]  Kevin J. Lang,et al.  Speech recognition using time‐delay neural networks , 1988 .

[43]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[44]  J. Marron Automatic smoothing parameter selection: A survey , 1988 .

[45]  Patrick Gallinari,et al.  Multilayer perceptrons and data analysis , 1988, IEEE 1988 International Conference on Neural Networks.

[46]  W. Härdle,et al.  How Far are Automatically Chosen Regression Smoothing Parameters from their Optimum , 1988 .

[47]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[48]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[49]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[50]  S. Ghosh,et al.  An application of a multiple neural network learning system to emulation of mortgage underwriting judgements , 1988, IEEE 1988 International Conference on Neural Networks.

[51]  Isabelle Guyon Réseaux de neurones pour la reconnaissance des formes : architectures et apprentissage , 1988 .

[52]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[53]  E Bienenstock,et al.  Elastic matching and pattern recognition in neural networks. , 1989 .

[54]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[55]  Ruzena Bajcsy,et al.  Multiresolution elastic matching , 1989, Comput. Vis. Graph. Image Process..

[56]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[57]  Robert Azencott Synchronous Boltzmann Machines and Gibbs Fields: Learning Algorithms , 1989, NATO Neurocomputing.

[58]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[59]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[60]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[61]  A. Barron,et al.  Statistical properties of artificial neural networks , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[62]  Yves Chauvin Dynamic Behavior of Constained Back-Propagation Networks , 1989, NIPS.

[63]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[64]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[65]  David Haussler,et al.  Generalizing the PAC model: sample size bounds from metric dimension-based uniform convergence results , 1989, 30th Annual Symposium on Foundations of Computer Science.

[66]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[67]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[68]  Eric B. Baum,et al.  A Proposal for More Powerful Learning Algorithms , 1989, Neural Computation.

[69]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[70]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[71]  Alan L. Yuille,et al.  Generalized Deformable Models, Statistical Physics, and Matching Problems , 1990, Neural Computation.

[72]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[73]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[74]  M. L. Rossen,et al.  Experiments with Representation in Neural Networks: Object Motion, Speech, and Arithmetic , 1990 .

[75]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[76]  Jenq-Neng Hwang,et al.  Projection pursuit learning networks for regression , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[77]  Eric B. Baum,et al.  When Are k-Nearest Neighbor and Back Propagation Accurate for Feasible Sized Sets of Examples? , 1990, EURASIP Workshop.

[78]  Geoffrey E. Hinton,et al.  The Bootstrap Widrow-Hoff Rule as a Cluster-Formation Algorithm , 1990, Neural Computation.

[79]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[80]  J. Faraway,et al.  Bootstrap choice of bandwidth for density estimation , 1990 .

[81]  Vijay K. Samalam,et al.  Exhaustive Learning , 1990, Neural Computation.

[82]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[83]  James A. Pittman,et al.  Recognizing Hand-Printed Letters and Digits Using Backpropagation Learning , 1991, Neural Computation.

[84]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[85]  U. Grenander,et al.  Structural Image Restoration through Deformable Templates , 1991 .

[86]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[87]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[88]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[89]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .