Efficient Block Training of Multilayer Perceptrons

The attractive possibility of applying layerwise block training algorithms to multilayer perceptrons MLP, which offers initial advantages in computational effort, is refined in this article by means of introducing a sensitivity correction factor in the formulation. This results in a clear performance advantage, which we verify in several applications. The reasons for this advantage are discussed and related to implicit relations with second-order techniques, natural gradient formulations through Fisher's information matrix, and sample selection. Extensions to recurrent networks and other research lines are suggested at the close of the article.

[1]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[2]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[3]  J. Song,et al.  Learning with hidden targets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  S. Douglas,et al.  Linearized least-squares training of multilayer feedforward neural networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[5]  Paul W. Munro,et al.  Repeat Until Bored: A Pattern Selection Strategy , 1991, NIPS.

[6]  J. R. Deller,et al.  A simple 'linearized' learning algorithm which outperforms back-propagation , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[7]  C. M. Bowden,et al.  Neural network applications to nonlinear time series analysis , 1992, Proceedings of the 1992 International Conference on Industrial Electronics, Control, Instrumentation, and Automation.

[8]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[9]  Mahmood R. Azimi-Sadjadi,et al.  Fast learning process of multilayer neural networks using recursive least squares method , 1992, IEEE Trans. Signal Process..

[10]  E.D. Di Claudio,et al.  LS-based training algorithm for neural networks , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[11]  Frank Bärmann,et al.  A learning algorithm for multilayered neural networks based on linear least squares problems , 1993, Neural Networks.

[12]  Brian A. Telfer,et al.  Energy functions for minimizing misclassification error with minimum-complexity networks , 1994, Neural Networks.

[13]  Christian Cachin,et al.  Pedagogical pattern selection strategies , 1994, Neural Networks.

[14]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[15]  DeLiang Wang,et al.  Incremental learning of complex temporal patterns , 1996, IEEE Trans. Neural Networks.

[16]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[17]  Chih-Cheng Chen,et al.  A fast multilayer neural-network training algorithm based on the layer-by-layer optimizing procedures , 1996, IEEE Trans. Neural Networks.

[18]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[19]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[20]  Jesús Cid-Sueiro,et al.  Cost functions to estimate a posteriori probabilities in multiclass problems , 1999, IEEE Trans. Neural Networks.

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Dale Borowiak,et al.  Linear Models, Least Squares and Alternatives , 2001, Technometrics.