An efficient constrained training algorithm for feedforward networks

A novel algorithm is presented which supplements the training phase in feedforward networks with various forms of information about desired learning properties. This information is represented by conditions which must be satisfied in addition to the demand for minimization of the usual mean square error cost function. The purpose of these conditions is to improve convergence, learning speed, and generalization properties through prompt activation of the hidden units, optimal alignment of successive weight vector offsets, elimination of excessive hidden nodes, and regulation of the magnitude of search steps in the weight space. The algorithm is applied to several small- and large-scale binary benchmark training tasks, to test its convergence ability and learning speed, as well as to a large-scale OCR problem, to test its generalization capability. Its performance in terms of percentage of local minima, learning speed, and generalization ability is evaluated and found superior to the performance of the backpropagation algorithm and variants thereof taking especially into account the statistical significance of the results.

[1]  I. Guyon,et al.  Handwritten digit recognition: applications of neural network chips and automatic learning , 1989, IEEE Communications Magazine.

[2]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[3]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[4]  Jerome A. Feldman,et al.  Connectionist Models and Their Applications: Introduction , 1985 .

[5]  Yann Le Cun,et al.  A Theoretical Framework for Back-Propagation , 1988 .

[6]  Horst Bischof,et al.  Multispectral classification of Landsat-images using neural networks , 1992, IEEE Trans. Geosci. Remote. Sens..

[7]  Y S Abu-Mostafa,et al.  Neural networks for computing , 1987 .

[8]  Sharad Singhal,et al.  Training feed-forward networks with the extended Kalman algorithm , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  A. Goldstein On Steepest Descent , 1965 .

[10]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[11]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[12]  P. Swain,et al.  Neural Network Approaches Versus Statistical Methods In Classification Of Multisource Remote Sensing Data , 1990 .

[13]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[14]  A. E. Bryson,et al.  A Steepest-Ascent Method for Solving Optimum Programming Problems , 1962 .

[15]  P. Burt Fast filter transform for image processing , 1981 .

[16]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Tal Grossman,et al.  The CHIR Algorithm for Feed Forward Networks with Binary Weights , 1989, NIPS.

[18]  Dimitris A. Karras,et al.  An efficient constrained learning algorithm with momentum acceleration , 1995, Neural Networks.

[19]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[20]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[21]  P. J. Burt,et al.  Fast Filter Transforms for Image Processing , 1981 .

[22]  Alberto L. Sangiovanni-Vincentelli,et al.  Efficient Parallel Learning Algorithms for Neural Networks , 1988, NIPS.

[23]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[24]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[25]  Yu He,et al.  Asymptotic Convergence of Backpropagation , 1989, Neural Computation.

[26]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[27]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[28]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[29]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of one Variable and Addition , 1991 .

[30]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[31]  J. K. Hammond,et al.  On a generalised backpropagation algorithm based on optimal control theory , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[32]  Sang-Hoon Oh,et al.  An analysis of premature saturation in back propagation learning , 1993, Neural Networks.

[33]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[34]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[35]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[36]  D. Sprecher On the structure of continuous functions of several variables , 1965 .

[37]  Don R. Hush,et al.  Error surfaces for multilayer perceptrons , 1992, IEEE Trans. Syst. Man Cybern..

[38]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[39]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[40]  A. Khotanzad,et al.  Distortion invariant character recognition by a multi-layer perceptron and back-propagation learning , 1988, IEEE 1988 International Conference on Neural Networks.

[41]  L. Steels,et al.  Accelerated Learning in Back-propagation Nets , 1989 .

[42]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[43]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[44]  Brian D. Ripley,et al.  Flexible Non-linear Approaches to Classification , 1994 .

[45]  John Moody,et al.  Note on generalization, regularization and architecture selection in nonlinear learning systems , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[46]  N. Lipley,et al.  Moving targets , 1996, Nature.

[47]  J. Stephen Judd,et al.  On the complexity of loading shallow neural networks , 1988, J. Complex..

[48]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[49]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[50]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[51]  L. F. A. Wessels,et al.  The Physical Correlates of Local Minima , 1990 .

[52]  Francesco Palmieri,et al.  Optimal filtering algorithms for fast learning in feedforward neural networks , 1992, Neural Networks.