Analysis of activation functions for particle swarm optimised feedforward neural networks

Previous studies of feedforward neural networks (FFNNs) have found that asymptotically bounded activation functions used by particle swarm optimised (PSO) FFNNs have a significant impact on the swarm behaviour and FFNN performance. A number of alternative activation functions have however been developed that offer potential advantages over popularly used functions. The purpose of this study is to compare the Elliot, rectified linear, leaky rectified linear and softplus functions with the sigmoid and hyperbolic tangent functions on classification and regression problems. It is shown that the rectified linear function has equal performance to the sigmoid and hyperbolic tangent without the disadvantages of the bounded activation functions. Adaptive versions of the functions are compared on unscaled data sets using the PSO lambda-gamma algorithm. It is shown that shallower gradients are beneficial to accuracy, but that the FFNNs have inferior generalisation capability when compared to networks trained on scaled data.

[1]  Andries Petrus Engelbrecht,et al.  Global optimization algorithms for training product unit neural networks , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[2]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[3]  Andries Petrus Engelbrecht,et al.  Lambda-gamma learning with feedforward neural networks using particle swarm optimization , 2011, 2011 IEEE Symposium on Swarm Intelligence.

[4]  Andries Petrus Engelbrecht,et al.  Particle swarm optimization: Velocity initialization , 2012, 2012 IEEE Congress on Evolutionary Computation.

[5]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[6]  F. van den Bergh,et al.  Training product unit networks using cooperative particle swarm optimisers , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[7]  Andries Petrus Engelbrecht,et al.  Automatic Scaling using Gamma Learning for Feedforward Neural Networks , 1995, IWANN.

[8]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[9]  Andries Petrus Engelbrecht,et al.  Saturation in PSO neural network training: Good or evil? , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[10]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[12]  Jacek M. Zurada Lambda learning rule for feedforward neural networks , 1993, IEEE International Conference on Neural Networks.

[13]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Yoshua Bengio,et al.  Série Scientifique Scientific Series Incorporating Second-order Functional Knowledge for Better Option Pricing Incorporating Second-order Functional Knowledge for Better Option Pricing , 2022 .

[16]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[17]  Andries P. Engelbrecht,et al.  Pruning product unit neural networks , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[18]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[19]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Andries Petrus Engelbrecht,et al.  Overfitting by PSO trained feedforward neural networks , 2010, IEEE Congress on Evolutionary Computation.

[21]  van Wyk,et al.  An Analysis of Overfitting in Particle Swarm Optimised Neural Networks , 2015 .

[22]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[23]  David L. Elliott,et al.  A Better Activation Function for Artificial Neural Networks , 1993 .

[24]  Andries Petrus Engelbrecht,et al.  Analysis of stagnation behaviour of competitive coevolutionary trained neuro-controllers , 2014, 2014 IEEE Symposium on Swarm Intelligence.

[25]  Andries Petrus Engelbrecht,et al.  Measuring exploration/exploitation in particle swarms using swarm diversity , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[26]  Andries Petrus Engelbrecht,et al.  Cooperative learning in neural networks using particle swarm optimizers , 2000, South Afr. Comput. J..

[27]  Frans van den Bergh,et al.  An analysis of particle swarm optimizers , 2002 .

[28]  Paulo Cortez,et al.  Particle swarms for feedforward neural network training , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[29]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[30]  J. Kennedy,et al.  Population structure and particle swarm performance , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[31]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.