Weight regularisation in particle swarm optimisation neural network training

Applying weight regularisation to gradient-descent based neural network training methods such as backpropagation was shown to improve the generalisation performance of a neural network. However, the existing applications of weight regularisation to particle swarm optimisation are very limited, despite being promising. This paper proposes adding a regularisation penalty term to the objective function of the particle swarm. The impact of different penalty terms on the resulting neural network performance as trained by both backpropagation and particle swarm optimisation is analysed. Swarm behaviour under weight regularisation is studied, showing that weight regularisation results in smaller neural network architectures and more convergent swarms.

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[3]  Teresa Bernarda Ludermir,et al.  Particle Swarm Optimization of Feed-Forward Neural Networks with Weight Decay , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[4]  Axel Röbel Dynamic Pattern Selection: Effectively training Backpropagation Neural Networks , 1994 .

[5]  S. Lawrence,et al.  Function Approximation with Neural Networks and Local Methods: Bias, Variance and Smoothness , 1996 .

[6]  Lars Kai Hansen,et al.  Adaptive Regularization in Neural Network Modeling , 2012, Neural Networks: Tricks of the Trade.

[7]  A. P. Engelbrecht,et al.  Particle Swarm Optimization: Global Best or Local Best? , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[8]  J. Davim,et al.  Application of Artificial Neural Network for the Prediction of Surface Roughness in Drilling GFRP Composites , 2013, SOCO 2013.

[9]  Amit Gupta,et al.  Weight decay backpropagation for noisy data , 1998, Neural Networks.

[10]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[11]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[12]  Andries Petrus Engelbrecht,et al.  Overfitting by PSO trained feedforward neural networks , 2010, IEEE Congress on Evolutionary Computation.

[13]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[14]  A. J. Nijman,et al.  PARLE Parallel Architectures and Languages Europe , 1987, Lecture Notes in Computer Science.

[15]  Andries Petrus Engelbrecht,et al.  A Cooperative approach to particle swarm optimization , 2004, IEEE Transactions on Evolutionary Computation.

[16]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[17]  Marcin Korytkowski,et al.  Application of Neural Networks in Assessing Changes around Implant after Total Hip Arthroplasty , 2012, ICAISC.