Back propagation Family Album

Training of multi-layer feed-forward artiicial neural networks via the backpropagation algorithm has a popularity and general appeal exceeded only by the simplicity of its implementation and the way in which the algorithm may be improved upon or specialised for any given problem. These factors have given rise to the current situation where there are so many algorithms which trace their heritage to backpropagation that it is almost impossible for any one person to be aware of all of the developments, and it is even more diicult for someone new to the area of feed-forward networks to choose an appropriate member of the backpropagation family for the task envisaged. This is a survey of backpropagation learning algorithms and techniques. It brings together members of the backpropagation family which have been developed away from their siblings over the years, from its conception to modern modiications to the algorithm. In this taxonomy, the family members are broken into ve major branches of the tree : Algorithms, Heuristics, Regularisers, Network Techniques and Generative Networks. Each branch has grown away from the trunk in a diierent direction, and only rarely have the limbs touched, generally only where two minor branches have come together. This is a place where many branches meet. It is my belief that in bringing together some of these distant relations, new combinations of techniques can be investigated which marry the particular advantages of disparate branches in the family and breed new methods carrying the best attributes of their predecessors. There is no intention for this document to investigate issues of speed of training, or any performance measures, on the algorithms presented, but merely to summarise the details of the variations on backpropagation that have been developed.

[1]  Amir Atiya Learning algorithms for neural networks , 1991 .

[2]  Russell C. Eberhart,et al.  Neural network PC tools , 1990 .

[3]  Bernardo A. Huberman,et al.  AN IMPROVED THREE LAYER, BACK PROPAGATION ALGORITHM , 1987 .

[4]  Cris Koutsougeras,et al.  Complex domain backpropagation , 1992 .

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Petri Koistinen,et al.  Using additive noise in back-propagation training , 1992, IEEE Trans. Neural Networks.

[7]  Eduardo D. Sontag,et al.  Feedback Stabilization Using Two-Hidden-Layer Nets , 1991, 1991 American Control Conference.

[8]  B. Widrow,et al.  The complex LMS algorithm , 1975, Proceedings of the IEEE.

[9]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[10]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[11]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[12]  R. R. Leighton,et al.  The autoregressive backpropagation algorithm , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[13]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[14]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[15]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[16]  Joachim Diederich,et al.  Connectionist Recruitment Learning , 1988, ECAI.

[17]  Timur Ash,et al.  Dynamic node creation in backpropagation networks , 1989 .

[18]  Bernard Widrow,et al.  Layered neural nets for pattern recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[19]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[20]  Frank Fallside,et al.  An adaptive training algorithm for back propagation networks , 1987 .

[21]  Mark Dolson Machine tongues XII : Neural networks , 1989 .

[22]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[23]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[24]  Y.-F. Huang,et al.  Learning algorithms for perceptions using back-propagation with selective updates , 1990, IEEE Control Systems Magazine.

[25]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..

[26]  Minoru Fukumi,et al.  A new neuron model "cone" with fast convergence rate and its application to pattern recognition , 1991, Systems and Computers in Japan.

[27]  Roy Batruni,et al.  A multilayer neural network with piecewise-linear structure and back-propagation learning , 1991, IEEE Trans. Neural Networks.

[28]  Markus Höhfeld,et al.  Learning with limited numerical precision using the cascade-correlation algorithm , 1992, IEEE Trans. Neural Networks.

[29]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[30]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[31]  R.J.F. Dow,et al.  Neural net pruning-why and how , 1988, IEEE 1988 International Conference on Neural Networks.

[32]  G R Little,et al.  Generalization of the backpropagation neural network learning algorithm to permit complex weights. , 1990, Applied optics.

[33]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[34]  Y.-H. Yu,et al.  Extra output biased learning , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[35]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[36]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[37]  L. Ljung,et al.  Overtraining, Regularization, and Searching for Minimum in Neural Networks , 1992 .

[38]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[39]  J.A. Anderson,et al.  Neurocomputing: Foundations of Research@@@Neurocomputing 2: Directions for Research , 1992 .

[40]  Thomas Jackson,et al.  Neural Computing - An Introduction , 1990 .

[41]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[42]  Thomas L. Griffiths,et al.  Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.

[43]  S. Grossberg Neural Networks and Natural Intelligence , 1988 .

[44]  Harry A. C. Eaton,et al.  Learning coefficient dependence on training set size , 1992, Neural Networks.

[45]  Yves Chauvin Dynamic Behavior of Constained Back-Propagation Networks , 1989, NIPS.

[46]  Luís B. Almeida,et al.  Speeding up Backpropagation , 1990 .

[47]  Roberto Battiti,et al.  BFGS Optimization for Faster and Automated Supervised Learning , 1990 .

[48]  Y.-H. Yu,et al.  Descending epsilon in back-propagation: a technique for better generalization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[49]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[50]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[51]  Tsu-Chang Lee,et al.  Structure level adaptation for artificial neural networks , 1991 .

[52]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[53]  David H. Ackley,et al.  Generalization and Scaling in Reinforcement Learning , 1989, NIPS.

[54]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[55]  Richard S. Sutton,et al.  Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .

[56]  Francesco Piazza,et al.  On the complex backpropagation algorithm , 1992, IEEE Trans. Signal Process..

[57]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[58]  Martin A. Riedmiller,et al.  Rprop - Description and Implementation Details , 1994 .

[59]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[60]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[61]  Leonard G. C. Hamey,et al.  Benchmarking Feed-Forward Neural Networks: Models and Measures , 1991, NIPS.

[62]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[63]  Minoru Fukumi,et al.  A new back-propagation algorithm with coupled neuron , 1991, International 1989 Joint Conference on Neural Networks.

[64]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[65]  Javier R. Movellan,et al.  Benefits of gain: speeded learning and minimal hidden layers in back-propagation networks , 1991, IEEE Trans. Syst. Man Cybern..

[66]  Philip D. Wasserman,et al.  Neural computing - theory and practice , 1989 .

[67]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[68]  Edgar Sanchez-Sinencio,et al.  Artificial Neural Networks: Paradigms, Applications, and Hardware Implementations , 1994 .

[69]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[70]  Joachim Diederich Artificial Neural Networks , 1990 .

[71]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[72]  Alex Waibel,et al.  Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[73]  Wolfram Schiffmann,et al.  Synthesis and Performance Analysis of Multilayer Neural Network Architectures , 1992 .

[74]  Barak A. Pearlmutter Gradient Descent: Second Order Momentum and Saturating Error , 1991, NIPS.

[75]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[76]  José Luis Crespo,et al.  Tests of Different Regularization Terms in Small Networks , 1993, IWANN.

[77]  Clark C. Guest,et al.  Modification of backpropagation networks for complex-valued signal processing in frequency domain , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[78]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[79]  R. Hecht-Nielsen,et al.  Theory of the Back Propagation Neural Network , 1989 .

[80]  David L. Elliott,et al.  A Better Activation Function for Artificial Neural Networks , 1993 .

[81]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[82]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[83]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[84]  Clifford Lau,et al.  Neural Networks: Theoretical Foundations and Analysis , 1991 .

[85]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .