Feedforward Neural Network Methodology

From the Publisher: This monograph provides a through and coherent introduction to the mathematical properties of feedforward neural networks and to the computationally intensive methodology that has enabled their highly successful application to complex problems of pattern classification, forecasting, regression, and nonlinear systems modeling.

[1]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[2]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[3]  Santosh S. Venkatesh,et al.  The capacity of the Hopfield associative memory , 1987, IEEE Trans. Inf. Theory.

[4]  Masahiko Arai,et al.  Bounds on the number of hidden units in binary-valued three-layer neural networks , 1993, Neural Networks.

[5]  George Finlay Simmons,et al.  Introduction to Topology and Modern Analysis , 1963 .

[6]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[7]  R. Fletcher Practical Methods of Optimization , 1988 .

[8]  J. Baumeister Stable solution of inverse problems , 1987 .

[9]  Eduardo D. Sontag,et al.  Shattering All Sets of k Points in General Position Requires (k 1)/2 Parameters , 1997, Neural Computation.

[10]  D. Pollard Convergence of stochastic processes , 1984 .

[11]  Lynn Waterhouse,et al.  Neurophilosophy: Toward a Unified Science of the Mind/Brain , 1988 .

[12]  S. Graubard,et al.  The artificial intelligence debate: false starts, real foundations , 1990 .

[13]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[14]  Neil E. Cotter,et al.  The CMAC and a theorem of Kolmogorov , 1992, Neural Networks.

[15]  Harris Drucker,et al.  Boosting and Other Ensemble Methods , 1994, Neural Computation.

[16]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[17]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[18]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[19]  Shun-ichi Amari,et al.  Learning Curves, Model Selection and Complexity of Neural Networks , 1992, NIPS.

[20]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[21]  Amir Dembo,et al.  On the capacity of associative memories with linear threshold functions , 1989, IEEE Trans. Inf. Theory.

[22]  Nils J. Nilsson,et al.  The Mathematical Foundations of Learning Machines , 1990 .

[23]  Alon Orlitsky,et al.  Lower bounds on threshold and related circuits via communication complexity , 1994, IEEE Trans. Inf. Theory.

[24]  Eduardo D. Sontag,et al.  Feedback Stabilization Using Two-Hidden-Layer Nets , 1991, 1991 American Control Conference.

[25]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[26]  G.S. May,et al.  Manufacturing ICs the neural way , 1994, IEEE Spectrum.

[27]  C. Micchelli,et al.  Approximation by superposition of sigmoidal and radial basis functions , 1992 .

[28]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[29]  Neil E. Cotter,et al.  The Stone-Weierstrass theorem and its application to neural networks , 1990, IEEE Trans. Neural Networks.

[30]  J. Uspensky Introduction to mathematical probability , 1938 .

[31]  Charles A. Micchelli,et al.  How to Choose an Activation Function , 1993, NIPS.

[32]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[33]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[34]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[35]  S. M. Carroll,et al.  Construction of neural nets using the radon transform , 1989, International 1989 Joint Conference on Neural Networks.

[36]  Yong Liu,et al.  Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method , 1992, NIPS.

[37]  R. Shibata Selection of the order of an autoregressive model by Akaike's information criterion , 1976 .

[38]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[39]  Bernard Widrow,et al.  Sensitivity of feedforward neural networks to weight errors , 1990, IEEE Trans. Neural Networks.

[40]  C. Fefferman Reconstructing a neural net from its output , 1994 .

[41]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[42]  David J. C. MacKay,et al.  Bayesian Model Comparison and Backprop Nets , 1991, NIPS.

[43]  Michel Cosnard,et al.  Bounds on the Number of Units for Computing Arbitrary Dichotomies by Multilayer Perceptrons , 1994, J. Complex..

[44]  Eric B. Baum,et al.  On the capabilities of multilayer perceptrons , 1988, J. Complex..

[45]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[46]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[47]  D. Naiman,et al.  INCLUSION-EXCLUSION-BONFERRONI IDENTITIES AND INEQUALITIES FOR DISCRETE TUBE-LIKE PROBLEMS VIA EULER CHARACTERISTICS , 1992 .

[48]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[49]  C. Darken,et al.  Constructive Approximation Rates of Convex Approximation in Non-hilbert Spaces , 2022 .

[50]  L. K. Jones,et al.  Good weights and hyperbolic kernels for neural networks, projection pursuit, and pattern classification: Fourier strategies for extracting information from high-dimensional data , 1994, IEEE Trans. Inf. Theory.

[51]  Terrence L. Fine,et al.  Parameter Convergence and Learning Curves for Neural Networks , 1999, Neural Computation.

[52]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[53]  D. G. Stork,et al.  Is backpropagation biologically plausible? , 1989, International 1989 Joint Conference on Neural Networks.

[54]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[55]  R. Ash,et al.  Real analysis and probability , 1975 .

[56]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Kenji Fukumizu,et al.  A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network , 1996, Neural Networks.

[58]  Marcus R. Frean,et al.  A "Thermal" Perceptron Learning Rule , 1992, Neural Computation.

[59]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[60]  Lawrence D. Jackel,et al.  Neural Network Applications in Character Recognition and Document Analysis , 1994 .

[61]  Eduardo D. Sontag,et al.  Neural Networks with Quadratic VC Dimension , 1995, J. Comput. Syst. Sci..

[62]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[63]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[64]  Edoardo Amaldi,et al.  From finding maximum feasible subsystems of linear systems to feedforward neural network design , 1994 .

[65]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[66]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[67]  Adam Kowalczyk,et al.  Estimates of Storage Capacity of Multilayer Perceptron with Threshold Logic Hidden Units , 1997, Neural Networks.

[68]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[70]  Hidefumi Katsuura,et al.  Computational aspects of Kolmogorov's superposition theorem , 1994, Neural Networks.

[71]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[72]  Pascal Koiran,et al.  On the complexity of approximating mappings using feedforward networks , 1993, Neural Networks.

[73]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[74]  Gerhard Paass,et al.  Assessing and Improving Neural Network Predictions by the Bootstrap Algorithm , 1992, NIPS.

[75]  H. Akaike A new look at the statistical model identification , 1974 .

[76]  D. Aldous Probability Approximations via the Poisson Clumping Heuristic , 1988 .

[77]  Mihalis Yannakakis,et al.  Simple Local Search Problems That are Hard to Solve , 1991, SIAM J. Comput..

[78]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[79]  Thomas Kailath,et al.  Classification of linearly nonseparable patterns by linear threshold elements , 1995, IEEE Trans. Neural Networks.

[80]  R. Shibata Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process , 1980 .

[81]  Michael Kearns,et al.  A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split , 1995, Neural Computation.

[82]  Tadashi Shibata,et al.  Neuron-MOS Temporal Winner Search Hardware for Fully-Parallel Data Processing , 1995, NIPS.

[83]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[84]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[85]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[86]  L. Jones Constructive approximations for neural networks by sigmoidal functions , 1990, Proc. IEEE.

[87]  Michel Loève,et al.  Probability Theory I , 1977 .

[88]  R. DeVore,et al.  Optimal nonlinear approximation , 1989 .

[89]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[90]  Hava T. Siegelmann,et al.  On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.

[91]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[92]  Martin E. Dyer,et al.  A Random Polynomial Time Algorithm for Approximating the Volume of Convex Bodies , 1989, STOC.

[93]  Eduardo D. Sontag,et al.  UNIQUENESS OF WEIGHTS FOR NEURAL NETWORKS , 1993 .

[94]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[95]  Vijay Balasubramanian,et al.  Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.

[96]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[97]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[98]  Ali A. Minai,et al.  Perturbation response in feedforward networks , 1994, Neural Networks.

[99]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[100]  G. Lorentz Approximation of Functions , 1966 .

[101]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[102]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[103]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[104]  Terrence L. Fine,et al.  Forecasting Demand for Electric Power , 1992, NIPS.

[105]  Paul C. Kainen,et al.  Functionally Equivalent Feedforward Neural Networks , 1994, Neural Computation.

[106]  F. Vallet,et al.  Robustness in Multilayer Perceptrons , 1993, Neural Computation.

[107]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[108]  Gregory J. Wolff,et al.  Optimal Brain Surgeon: Extensions and performance comparisons , 1993, NIPS 1993.

[109]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[110]  D. Haussler,et al.  Rigorous learning curve bounds from statistical mechanics , 1996 .

[111]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[112]  J. Stephen Judd,et al.  Neural network design and the complexity of learning , 1990, Neural network modeling and connectionism.

[113]  Shun-ichi Amari,et al.  A universal theorem on learning curves , 1993, Neural Networks.

[114]  C. O'Cinneide The Mean is within One Standard Deviation of Any Median , 1990 .

[115]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[116]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[117]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[118]  Marek Karpinski,et al.  Polynomial bounds for VC dimension of sigmoidal neural networks , 1995, STOC '95.

[119]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[120]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[121]  Terrence L. Fine,et al.  Asymptotics of Gradient-based Neural Network Training Algorithms , 1994, NIPS.

[122]  Thomas Kailath,et al.  On the Perceptron Learning Algorithm on Data with High Precision , 1994, J. Comput. Syst. Sci..

[123]  Terrence L. Fine,et al.  Assessing generalization of feedforward neural networks , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[124]  Wray L. Buntine,et al.  Computing second derivatives in feed-forward networks: a review , 1994, IEEE Trans. Neural Networks.

[125]  L. K. Jones,et al.  The computational intractability of training sigmoidal neural networks , 1997, IEEE Trans. Inf. Theory.

[126]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[127]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[128]  Martin Anthony,et al.  Computational Learning Theory , 1992 .

[129]  S. E. Decatur,et al.  Application of neural networks to terrain classification , 1989, International 1989 Joint Conference on Neural Networks.

[130]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[131]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[132]  Robert H. Dodier,et al.  Geometry of Early Stopping in Linear Networks , 1995, NIPS.

[133]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[134]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[135]  M. Stone Asymptotics for and against cross-validation , 1977 .

[136]  David A. Sprecher,et al.  A universal mapping for kolmogorov's superposition theorem , 1993, Neural Networks.

[137]  Jack D. Cowan,et al.  Neural Networks: The Early Days , 1989, NIPS.

[138]  M.H. Hassoun,et al.  Fundamentals of Artificial Neural Networks , 1996, Proceedings of the IEEE.

[139]  C. Micchelli,et al.  Some remarks on ridge functions , 1987 .

[140]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[141]  Hrushikesh Narhar Mhaskar,et al.  Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..

[142]  Peter Auer,et al.  Exponentially many local minima for single neurons , 1995, NIPS.

[143]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[144]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[145]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[146]  John J. Shynk,et al.  Stationary points of a single-layer perceptron for nonseparable data models , 1993, Neural Networks.

[147]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[148]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[149]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[150]  H. D. Block,et al.  Analysis of a Four-Layer Series-Coupled Perceptron. II , 1962 .

[151]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[152]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[153]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[154]  Eytan Domany,et al.  Learning by Choice of Internal Representations , 1988, Complex Syst..

[155]  Ron Shonkwiler,et al.  Separating the vertices of N-cubes by hyperplanes and its application to artificial neural networks , 1993, IEEE Trans. Neural Networks.

[156]  D. M. Y. Sommerville,et al.  An Introduction to The Geometry of N Dimensions , 2022 .

[157]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[158]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[159]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[160]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[161]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Graduate Texts in Computer Science.

[162]  P. Halmos Finite-Dimensional Vector Spaces , 1960 .

[163]  Gavin J. Gibson,et al.  Exact Classification with Two-Layer Neural Nets , 1996, J. Comput. Syst. Sci..

[164]  Terrence L. Fine,et al.  Sample Size Requirements for Feedforward Neural Networks , 1994, NIPS.

[165]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[166]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[167]  Wolfgang Maass,et al.  Neural Nets with Superlinear VC-Dimension , 1994, Neural Computation.

[168]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[169]  Richard P. Brent,et al.  Fast training algorithms for multilayer neural nets , 1991, IEEE Trans. Neural Networks.

[170]  John E. Moody,et al.  Towards Faster Stochastic Gradient Search , 1991, NIPS.

[171]  Gérard Dreyfus,et al.  Handwritten digit recognition by neural networks with single-layer training , 1992, IEEE Trans. Neural Networks.

[172]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[173]  Isabelle Guyon,et al.  Design of a neural network character recognizer for a touch terminal , 1991, Pattern Recognit..

[174]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[175]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[176]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[177]  Charles Fefferman,et al.  Recovering a Feed-Forward Net From Its Output , 1993, NIPS.

[178]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[179]  Eduardo D. Sontag,et al.  Feedforward Nets for Interpolation and Classification , 1992, J. Comput. Syst. Sci..

[180]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[181]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[182]  John C. Platt,et al.  A Neural Network Classifier for the I100 OCR Chip , 1995, NIPS.

[183]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[184]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[185]  James O. Berger,et al.  Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[186]  T. Kailath,et al.  Discrete Neural Computation: A Theoretical Foundation , 1995 .

[187]  Sherif Hashem,et al.  Optimal Linear Combinations of Neural Networks , 1997, Neural Networks.

[188]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[189]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[190]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[191]  Ali A. Minai,et al.  On the derivatives of the sigmoid , 1993, Neural Networks.

[192]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[193]  E. Littmann Generalization Abilities of Cascade Network Architectures , 1992 .

[194]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[195]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[196]  N. Wiener The Fourier Integral: and certain of its Applications , 1933, Nature.

[197]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[198]  Stephen I. Gallant,et al.  Perceptron-based learning algorithms , 1990, IEEE Trans. Neural Networks.