High-order and multilayer perceptron initialization

Proper initialization is one of the most important prerequisites for fast convergence of feedforward neural networks like high-order and multilayer perceptrons. This publication aims at determining the optimal variance (or range) for the initial weights and biases, which is the principal parameter of random initialization methods for both types of neural networks. An overview of random weight initialization methods for multilayer perceptrons is presented. These methods are extensively tested using eight real-world benchmark data sets and a broad range of initial weight variances by means of more than 30000 simulations, in the aim to find the best weight initialization method for multilayer perceptrons. For high-order networks, a large number of experiments (more than 200000 simulations) was performed, using three weight distributions, three activation functions, several network orders, and the same eight data sets. The results of these experiments are compared to weight initialization techniques for multilayer perceptrons, which leads to the proposal of a suitable initialization method for high-order perceptrons. The conclusions on the initialization methods for both types of networks are justified by sufficiently small confidence intervals of the mean convergence times.

[1]  Emile Fiesler,et al.  Do Backpropagation Trained Neural Networks have Normal Weight Distributions , 1993 .

[2]  C.-L. Chen,et al.  Improving the training speed of three-layer feedforward neural nets by optimal estimation of the initial weights , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  John C. Platt Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.

[4]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[5]  Frank J. Smieja Hyperplane \spin" Dynamics, Network Plasticity and Back-propagation Learning , 1991 .

[6]  Peter Schmidt,et al.  The Theory and Practice of Econometrics , 1985 .

[7]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[8]  Emile Fiesler,et al.  The Interchangeability of Learning Rate and Gain in Backpropagation Neural Networks , 1996, Neural Computation.

[9]  Bernard Widrow,et al.  Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[10]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[11]  Thierry Denoeux,et al.  Initializing back propagation networks with prototypes , 1993, Neural Networks.

[12]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[13]  Emile Fiesler Neural Network Topologies , 1996 .

[14]  Sang-Hoon Oh,et al.  An analysis of premature saturation in back propagation learning , 1993, Neural Networks.

[15]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[16]  Jong Beom Ra,et al.  Weight value initialization for improving training speed in the backpropagation network , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[17]  Sandro Ridella,et al.  Statistically controlled activation weight initialization (SCAWI) , 1992, IEEE Trans. Neural Networks.

[18]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[19]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[20]  Emile Fiesler,et al.  Modular Object-Oriented Neural Network Simulators and Topology Generalizations , 1994 .

[21]  E. FieslerIDIAP,et al.  Adaptive Multilayer Optical Neural Network with Optical Thresholding , 1995 .

[22]  John F. Kolen,et al.  Backpropagation is Sensitive to Initial Conditions , 1990, Complex Syst..