An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks

The activation functions used in an artificial neural network define how nodes of the network respond to input, directly influence the shape of the error surface and play a role in the difficulty of the neural network training problem. Choice of activation functions is a significant question which must be addressed when applying a neural network to a problem. One issue which must be considered when selecting an activation function is known as activation function saturation. Saturation occurs when a bounded activation function primarily outputs values close to its boundary. Excessive saturation damages the network’s ability to encode information and may prevent successful training. Common functions such as the logistic and hyperbolic tangent functions have been shown to exhibit saturation when the neural network is trained using particle swarm optimization. This study proposes a new measure of activation function saturation, evaluates the saturation behavior of eight common activation functions, and evaluates six measures of controlling activation function saturation in particle swarm optimization based neural network training. Activation functions that result in low levels of saturation are identified. For each activation function recommendations are made regarding which saturation control mechanism is most effective at reducing saturation.

[1]  Andries Petrus Engelbrecht,et al.  Lambda-gamma learning with feedforward neural networks using particle swarm optimization , 2011, 2011 IEEE Symposium on Swarm Intelligence.

[2]  Kyle Robert Harrison An Analysis of Parameter Control Mechanisms for the Particle Swarm Optimization Algorithm , 2018 .

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Andries Petrus Engelbrecht,et al.  An Analysis of Competitive Coevolutionary Particle Swarm Optimizers to Train Neural Network Game Tree Evaluation Functions , 2016, ICSI.

[5]  Teresa Bernarda Ludermir,et al.  Particle Swarm Optimization of Feed-Forward Neural Networks with Weight Decay , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[6]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Erry Yulian Triblas Adesta,et al.  Investigation of the effect of cutting speed on the Surface Roughness parameters in CNC End Milling using Artificial Neural Network , 2013 .

[8]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[9]  Andries Petrus Engelbrecht,et al.  Saturation in PSO neural network training: Good or evil? , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[10]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[11]  E. T. Oldewage,et al.  The perils of particle swarm optimization in high dimensional problem spaces , 2005 .

[12]  Andries Petrus Engelbrecht,et al.  Training feedforward neural networks with dynamic particle swarm optimisation , 2012, Swarm Intelligence.

[13]  Lars Schmidt-Thieme,et al.  Beyond Manual Tuning of Hyperparameters , 2015, KI - Künstliche Intelligenz.

[14]  A. R. Technische,et al.  The Dynamic Pattern Selection Algorithm: Eeective Training and Controlled Generalization of Backpropagation Neural Networks , 1994 .

[15]  Andries Petrus Engelbrecht,et al.  Overfitting by PSO trained feedforward neural networks , 2010, IEEE Congress on Evolutionary Computation.

[16]  David L. Elliott,et al.  A Better Activation Function for Artificial Neural Networks , 1993 .

[17]  Andries Petrus Engelbrecht,et al.  Particle swarm optimization: Velocity initialization , 2012, 2012 IEEE Congress on Evolutionary Computation.

[18]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[19]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[20]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[21]  Andries Petrus Engelbrecht,et al.  Analysis of activation functions for particle swarm optimised feedforward neural networks , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[22]  Etienne Barnard,et al.  Avoiding false local minima by proper initialization of connections , 1992, IEEE Trans. Neural Networks.

[23]  S. Lawrence,et al.  Function Approximation with Neural Networks and Local Methods: Bias, Variance and Smoothness , 1996 .

[24]  Rolf Wanka,et al.  Theoretical Analysis of Initial Particle Swarm Behavior , 2008, PPSN.

[25]  Paulo Cortez,et al.  Particle swarms for feedforward neural network training , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[26]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[27]  M. Taylan Das,et al.  Signature verification (SV) toolbox: Application of PSO-NN , 2009, Eng. Appl. Artif. Intell..

[28]  Gérard Dreyfus,et al.  Neural networks - methodology and applications , 2005 .

[29]  Andries Petrus Engelbrecht,et al.  Boundary Constraint Handling Techniques for Particle Swarm Optimization in High Dimensional Problem Spaces , 2018, ANTS Conference.

[30]  Andries Petrus Engelbrecht,et al.  Cooperative learning in neural networks using particle swarm optimizers , 2000, South Afr. Comput. J..

[31]  Rolf Wanka,et al.  Particle Swarm Optimization in High-Dimensional Bounded Search Spaces , 2007, 2007 IEEE Swarm Intelligence Symposium.

[32]  Siti Mariyam Shamsuddin,et al.  Particle Swarm Optimization: Technique, System and Challenges , 2011 .

[33]  Kevin Leyton-Brown,et al.  Efficient benchmarking of algorithm configurators via model-based surrogates , 2017, Machine Learning.

[34]  Andries Petrus Engelbrecht,et al.  Automatic Scaling using Gamma Learning for Feedforward Neural Networks , 1995, IWANN.

[35]  J. Kennedy,et al.  Population structure and particle swarm performance , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[36]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[37]  Andries Petrus Engelbrecht,et al.  Training high-dimensional neural networks with cooperative particle swarm optimiser , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[38]  Andries Petrus Engelbrecht,et al.  Weight regularisation in particle swarm optimisation neural network training , 2014, 2014 IEEE Symposium on Swarm Intelligence.

[39]  Thomas Stützle,et al.  Automated Design of Metaheuristic Algorithms , 2018, Handbook of Metaheuristics.

[40]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[41]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[42]  Andries Petrus Engelbrecht,et al.  Measuring exploration/exploitation in particle swarms using swarm diversity , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[43]  Yoshua Bengio,et al.  Incorporating Second-Order Functional Knowledge for Better Option Pricing , 2000, NIPS.

[44]  Axel Röbel Dynamic Pattern Selection: Effectively training Backpropagation Neural Networks , 1994 .

[45]  Hermann Ney,et al.  Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.

[46]  Shu-Tao Xia,et al.  Back-propagation neural network on Markov chains from system call sequences: a new approach for detecting Android malware with system call sequences , 2017, IET Inf. Secur..

[47]  Andries Petrus Engelbrecht,et al.  Measuring Saturation in Neural Networks , 2015, 2015 IEEE Symposium Series on Computational Intelligence.