Neural Networks and the BiadVariance Dilemma

Feedforward neural networks trained by error backpropagation are examples of nonparametric regression estimators. We present a tutorial on nonparametric inference and its relation to neural networks, and we use the statistical viewpoint to highlight strengths and weaknesses of neural models. We illustrate the main points with some recognition experiments involving artificial data as well as handwritten numerals. In way of conclusion, we suggest that current-generation feedforward neural networks are largely inadequate for difficult problems in machine perception and machine learning, regardless of parallelversus-serial hardware or other implementation issues. Furthermore, we suggest that the fundamental challenges in neural modeling are about representation rather than learning per se. This last point is supported by additional experiments with handwritten numerals.

[1]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[2]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[3]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[4]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[5]  C. von der Malsburg,et al.  Am I Thinking Assemblies , 1986 .

[6]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[7]  C. Chatfield,et al.  Neural networks: Forecasting breakthrough or passing fad? , 1993 .

[8]  Vijay K. Samalam,et al.  Exhaustive Learning , 1990, Neural Computation.

[9]  Eric B. Baum,et al.  When Are k-Nearest Neighbor and Back Propagation Accurate for Feasible Sized Sets of Examples? , 1990, EURASIP Workshop.

[10]  David J. Burr,et al.  Elastic Matching of Line Drawings , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[12]  R. Guerrieri,et al.  Fuzzy rules optimization and logic synthesis , 1993, [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems.

[13]  Roberto Guerrieri,et al.  Highly-constrained neural networks with application to visual inspection of machined parts , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[15]  Robert Azencott Synchronous Boltzmann Machines and Gibbs Fields: Learning Algorithms , 1989, NATO Neurocomputing.

[16]  C. Malsburg,et al.  Statistical Coding and Short-Term Synaptic Plasticity: A Scheme for Knowledge Representation in the Brain , 1986 .

[17]  G. Wahba Convergence rates of "thin plate" smoothing splines wihen the data are noisy , 1979 .

[18]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[19]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[20]  D. Pollard Convergence of stochastic processes , 1984 .

[21]  Naftali Tishby,et al.  Consistent inference of probabilities in layered networks: predictions and generalizations , 1989, International 1989 Joint Conference on Neural Networks.

[22]  Shun-ichi Amari,et al.  Dualistic geometry of the manifold of higher-order neurons , 1991, Neural Networks.

[23]  S. Ghosh,et al.  An application of a multiple neural network learning system to emulation of mortgage underwriting judgements , 1988, IEEE 1988 International Conference on Neural Networks.

[24]  Yves Chauvin Dynamic Behavior of Constained Back-Propagation Networks , 1989, NIPS.

[25]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[26]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  E Bienenstock,et al.  Elastic matching and pattern recognition in neural networks. , 1989 .

[28]  Geoffrey E. Hinton,et al.  The Bootstrap Widrow-Hoff Rule as a Cluster-Formation Algorithm , 1990, Neural Computation.

[29]  E. F. Schuster,et al.  On the Nonconsistency of Maximum Likelihood Nonparametric Density Estimators , 1981 .

[30]  Peter D. Turney A theory of cross-validation error , 1994, J. Exp. Theor. Artif. Intell..

[31]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[32]  Gérard Dreyfus,et al.  Handwritten digit recognition by neural networks with single-layer training , 1992, IEEE Trans. Neural Networks.

[33]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[34]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[35]  Prabhat Hajela,et al.  Neural networks in structural analysis and design - An overview , 1992 .

[36]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[37]  James D. Keeler,et al.  Layered Neural Networks with Gaussian Hidden Units as Universal Approximations , 1990, Neural Computation.

[38]  S. Quartz Neural networks, nativism, and the plausibility of constructivism , 1993, Cognition.

[39]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[40]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[44]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[45]  Patrick Gallinari,et al.  Multilayer perceptrons and data analysis , 1988, IEEE 1988 International Conference on Neural Networks.

[46]  Jenq-Neng Hwang,et al.  Projection pursuit learning networks for regression , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[47]  James A. Pittman,et al.  Recognizing Hand-Printed Letters and Digits Using Backpropagation Learning , 1991, Neural Computation.

[48]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[49]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[50]  DeLiang Wang,et al.  Pattern recognition: neural networks in perspective , 1993, IEEE Expert.

[51]  Ruzena Bajcsy,et al.  Multiresolution elastic matching , 1989, Comput. Vis. Graph. Image Process..

[52]  Nathan Intrator,et al.  Combining Exploratory Projection Pursuit and Projection Pursuit Regression with Application to Neural Networks , 1993, Neural Computation.

[53]  U. Grenander,et al.  Structural Image Restoration through Deformable Templates , 1991 .

[54]  N. Intrator On the combination of supervised and unsupervised learning , 1993 .

[55]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[56]  J. Marron Automatic smoothing parameter selection: A survey , 1988 .

[57]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[58]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[59]  D. Freedman,et al.  On the consistency of Bayes estimates , 1986 .

[60]  P. Carnevali,et al.  Exhaustive Thermodynamical Analysis of Boolean Learning Networks , 1987 .

[61]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[62]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[63]  Grace Wahba,et al.  Constrained Regularization for Ill Posed Linear Operator Equations, with Applications in Meteorology and Medicine. , 1982 .

[64]  C. L. Cooney,et al.  A TASK DECOMPOSITION APPROACH TO USING NEURAL NETWORKS FOR THE INTERPRETATION OF BIOPROCESS DATA , 1992 .

[65]  Françoise Fogelman-Soulié,et al.  Disordered Systems and Biological Organization , 1986, NATO ASI Series.

[66]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[67]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[68]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[69]  David Haussler,et al.  Generalizing the PAC model: sample size bounds from metric dimension-based uniform convergence results , 1989, 30th Annual Symposium on Foundations of Computer Science.

[70]  S. Thiria,et al.  A neural network approach for modeling nonlinear transfer functions: Application for wind retrieval from spaceborne scatterometer data , 1993 .

[71]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[72]  Kechen Zhang,et al.  Emergence of Position-Independent Detectors of Sense of Rotation and Dilation with Hebbian Learning: An Analysis , 1999, Neural Computation.

[73]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[74]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[75]  Kevin J. Lang,et al.  Speech recognition using time‐delay neural networks , 1988 .

[76]  J. Friedman Multivariate adaptive regression splines , 1990 .

[77]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[78]  S. Geman,et al.  Nonparametric Maximum Likelihood Estimation by the Method of Sieves , 1982 .

[79]  David R. Cox The analysis of binary data , 1970 .

[80]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[81]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[82]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[83]  Jeffrey A. Stem,et al.  A computer-derived protocol to aid in the diagnosis of emergency room patients with acute chest pain. , 1982, The New England journal of medicine.

[84]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .

[85]  J.A. Anderson,et al.  Neural-network learning and Mark Twain's cat , 1992, IEEE Communications Magazine.

[86]  Isabelle Guyon Réseaux de neurones pour la reconnaissance des formes : architectures et apprentissage , 1988 .

[87]  W. Härdle,et al.  How Far are Automatically Chosen Regression Smoothing Parameters from their Optimum , 1988 .

[88]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[89]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[90]  M. L. Rossen,et al.  Experiments with Representation in Neural Networks: Object Motion, Speech, and Arithmetic , 1990 .

[91]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[92]  Carsten Peterson,et al.  JETNET 3.0—A versatile artificial neural network package , 1994 .

[93]  A. Barron,et al.  Statistical properties of artificial neural networks , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[94]  E. Veklerov,et al.  Stopping Rule for the MLE Algorithm Based on Statistical Hypothesis Testing , 1987, IEEE Transactions on Medical Imaging.

[95]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[96]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[97]  Padhraic Smyth,et al.  Hidden Markov models for fault detection in dynamic system , 1994, Pattern Recognit..

[98]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[99]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[100]  U. Grenander On empirical spectral analysis of stochastic processes , 1952 .

[101]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[102]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[103]  Joydeep Ghosh,et al.  A neural network based hybrid system for detection, characterization, and classification of short-duration oceanic signals , 1992 .