Saturation in PSO neural network training: Good or evil?

Particle swarm optimisation has been successfully applied as a neural network training algorithm before, often outperforming traditional gradient-based approaches. However, recent studies have shown that particle swarm optimisation does not scale very well, and performs poorly on high-dimensional neural network architectures. This paper hypothesises that hidden layer saturation is a significant factor contributing to the poor training performance of the particle swarms, hindering good performance on neural networks regardless of the architecture size. A selection of classification problems is used to test this hypothesis. It is discovered that although a certain degree of saturation is necessary for successful training, higher degrees of saturation ultimately lead to poor generalisation. Possible factors leading to saturation are suggested, and means of alleviating saturation in particle swarms through weight initialisation range, maximum velocity, and search space boundaries are analysed. This paper is intended as a preface to a more in-depth study of the problem of saturation in particle swarm optimisation as a neural network training algorithm.

[1]  Kwok-wing Chau,et al.  Application of a PSO-based neural network in analysis of outcomes of construction claims , 2007 .

[2]  J. Kennedy,et al.  Population structure and particle swarm performance , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[3]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Amit Gupta,et al.  Weight decay backpropagation for noisy data , 1998, Neural Networks.

[5]  F. Grimaccia,et al.  PSO as an effective learning algorithm for neural network applications , 2004, Proceedings. ICCEA 2004. 2004 3rd International Conference on Computational Electromagnetics and Its Applications, 2004..

[6]  Andries Petrus Engelbrecht,et al.  Training high-dimensional neural networks with cooperative particle swarm optimiser , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[7]  Andries Petrus Engelbrecht,et al.  Overfitting by PSO trained feedforward neural networks , 2010, IEEE Congress on Evolutionary Computation.

[8]  Lars Kai Hansen,et al.  Adaptive Regularization in Neural Network Modeling , 2012, Neural Networks: Tricks of the Trade.

[9]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[10]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[11]  S. Lawrence,et al.  Function Approximation with Neural Networks and Local Methods: Bias, Variance and Smoothness , 1996 .

[12]  Andries Petrus Engelbrecht,et al.  Lambda-gamma learning with feedforward neural networks using particle swarm optimization , 2011, 2011 IEEE Symposium on Swarm Intelligence.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Andries Petrus Engelbrecht,et al.  Cooperative learning in neural networks using particle swarm optimizers , 2000, South Afr. Comput. J..

[15]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[16]  Teresa Bernarda Ludermir,et al.  Particle Swarm Optimization of Feed-Forward Neural Networks with Weight Decay , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[17]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.