A new pruning heuristic based on variance analysis of sensitivity information

Architecture selection is a very important aspect in the design of neural networks (NNs) to optimally tune performance and computational complexity. Sensitivity analysis has been used successfully to prune irrelevant parameters from feedforward NNs. This paper presents a new pruning algorithm that uses the sensitivity analysis to quantify the relevance of input and hidden units. A new statistical pruning heuristic is proposed, based on the variance analysis, to decide which units to prune. The basic idea is that a parameter with a variance in sensitivity not significantly different from zero, is irrelevant and can be removed. Experimental results show that the new pruning algorithm correctly prunes irrelevant input and hidden units. The new pruning algorithm is also compared with standard pruning algorithms.

[1]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[2]  Shun-ichi Amari,et al.  Learning Curves, Model Selection and Complexity of Neural Networks , 1992, NIPS.

[3]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[4]  Andries P. Engelbrecht,et al.  Optimizing the number of hidden nodes of a feedforward artificial neural network , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  Lutz Prechelt,et al.  Connection pruning with static and adaptive pruning schedules , 1997, Neurocomputing.

[6]  Patrick Gallinari,et al.  Variable selection with neural networks , 1996, Neurocomputing.

[7]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[8]  Gustavo Deco,et al.  Two Strategies to Avoid Overfitting in Feedforward Networks , 1997, Neural Networks.

[9]  T. Takahashi Principal component analysis is a group action of SO(N) which minimizes an entropy function , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[10]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[11]  Jacek M. Zurada,et al.  Sensitivity analysis for minimization of input data dimension for feedforward neural network , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[12]  Jacek M. Zurada,et al.  Perturbation method for deleting redundant inputs of perceptron networks , 1997, Neurocomputing.

[13]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[14]  Andries Petrus Engelbrecht,et al.  Determining the Significance of Input Parameters using Sensitivity Analysis , 1995, IWANN.

[15]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[16]  Perturbation analysis explained , 1987, 26th IEEE Conference on Decision and Control.

[17]  Thomas Czernichow Architecture Selection through Statistical Sensitivity Analysis , 1996, ICANN.

[18]  Jie Zhang,et al.  A Sequential Learning Approach for Single Hidden Layer Neural Networks , 1998, Neural Networks.

[19]  M. Tateishi,et al.  Determination of the number of redundant hidden units in a three-layered feedforward neural network , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[20]  Halbert White,et al.  On learning the derivatives of an unknown mapping with multilayer feedforward networks , 1992, Neural Networks.

[21]  Syozo Yasui Convergence Suppression and Divergence Facilitation: Minimum and Joint Use of Hidden Units by Multiple Outputs , 1997, Neural Networks.

[22]  Masafumi Hagiwara,et al.  Removal of hidden units and weights for back propagation networks , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[23]  Andries P. Engelbrecht,et al.  A sensitivity analysis algorithm for pruning feedforward neural networks , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[24]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[25]  Kenneth W. Bauer,et al.  Determining input features for multilayer perceptrons , 1995, Neurocomputing.

[26]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[27]  J. M. Holtzmann On using perturbation analysis to do sensitivity analysis: derivatives versus differences , 1992 .

[28]  Lars Kai Hansen,et al.  Pruning with generalization based weight saliencies: gamma-OBD, gamma-OBS , 1995, NIPS.

[29]  Marie Cottrell,et al.  SSM: A Statistical Stepwise Method for Weight Elimination , 1994 .

[30]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[31]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[32]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[33]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[34]  Pietro Burrascano A pruning technique maximizing generalization , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[35]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[36]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[37]  Shigeo Abe,et al.  Input Layer Optimization of Neural Networks by Sensitivity Analysis and its Application to Recognition of Numerals , 1991 .

[38]  Yoshio Hirose,et al.  Backpropagation algorithm which varies the number of hidden units , 1991, International 1989 Joint Conference on Neural Networks.

[39]  Wright-Patterson Afb,et al.  Feature Selection Using a Multilayer Perceptron , 1990 .

[40]  Panos J. Antsaklis,et al.  The dependence identification neural network construction algorithm , 1996, IEEE Trans. Neural Networks.

[41]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[42]  D. Yeung,et al.  Constructive feedforward neural networks for regression problems : a survey , 1995 .

[43]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[44]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[45]  John Moody,et al.  Prediction Risk and Architecture Selection for Neural Networks , 1994 .

[46]  Andries Petrus Engelbrecht,et al.  Variance analysis of sensitivity information for pruning multilayer feedforward neural networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[47]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[48]  Kenneth W. Bauer,et al.  Integrated feature architecture selection , 1996, IEEE Trans. Neural Networks.

[49]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[50]  Panos A. Ligomenides,et al.  GANNET: a genetic algorithm for searching topology and weight spaces in neural network design. The first step in finding a neural network solution , 1993 .

[51]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .