Improving the Performance of Neural Networks with an Ensemble of Activation Functions

Activation functions in the neural networks play an important role by introducing non-linear properties to the neural networks. Thus it is considered as one of the essential ingredients among other building blocks of a neural network. But the selection of the appropriate activation function for the enhancement of model accuracy is strenuous in a sense; the performance of the NN-model is influenced by a proper selection of activation function for a dataset. Proper activation function selection is still a trial and error method for which the model accuracy improves for classification. As a solution to this problem, we have proposed an activation function ensembling by majority voting that has significantly improved the model accuracy in a classification context. The proposed model is tested on four benchmark datasets such as MNIST, Fashion MNIST, Semeion, and ARDIS IV datasets. The result shows that the performance of the proposed model is appreciably better than other traditional methods such as Convolutional Neural Network (CNN), Support Vector Machine (SVM), Recurrent Neural Network (RNN).

[1]  Guoqiang Peter Zhang,et al.  Neural networks for classification: a survey , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[2]  Håkan Grahn,et al.  ARDIS: a Swedish historical handwritten digit dataset , 2019, Neural Computing and Applications.

[3]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[4]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[5]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[6]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  N.D.A. Mascarenhas,et al.  Material analysis on noisy multispectral images using classifier combination , 2004, 6th IEEE Southwest Symposium on Image Analysis and Interpretation, 2004..

[8]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[9]  Y. Ho,et al.  Simple Explanation of the No-Free-Lunch Theorem and Its Implications , 2002 .

[10]  Jeon Gue Park,et al.  Deep neural network using trainable activation functions , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[11]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Chiara Zucco Multiple Learners Combination: Cascading , 2019, Encyclopedia of Bioinformatics and Computational Biology.

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  Nelson D. A. Mascarenhas,et al.  Multilayer Perceptron Classifier Combination for Identification of Materials on Noisy Soil Science Multispectral Images , 2007, XX Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2007).

[16]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[17]  David H. Wolpert,et al.  Ubiquity symposium: Evolutionary computation and the processes of life: what the no free lunch theorems really mean: how to improve search algorithms , 2013, UBIQ.

[18]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[19]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[20]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[21]  Nitish D. Patel,et al.  SQNL: A New Computationally Efficient Activation Function , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[22]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[23]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[24]  Masahiro Takatsuka,et al.  Hexpo: A vanishing-proof activation function , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[25]  Gavin Brown,et al.  "Good" and "Bad" Diversity in Majority Vote Ensembles , 2010, MCS.

[26]  Yanpeng Li,et al.  Improving deep neural networks using softplus units , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[27]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[29]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[30]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[32]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[33]  Diego Klabjan,et al.  Activation Ensembles for Deep Neural Networks , 2017, 2019 IEEE International Conference on Big Data (Big Data).

[34]  João Paulo Papa,et al.  Improving Accuracy and Speed of Optimum-Path Forest Classifier Using Combination of Disjoint Training Subsets , 2011, MCS.

[35]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[36]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[37]  Yoshua Bengio,et al.  Mollifying Networks , 2016, ICLR.

[38]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.