Variable selection with neural networks

Abstract In this paper, we present 3 different neural network-based methods to perform variable selection . OCD — Optimal Cell Damage — is a pruning method, which evaluates the usefulness of a variable and prunes the least useful ones (it is related to the Optimal Brain Damage method of Le Cun et al.). Regularization theory proposes to constrain estimators by adding a term to the cost function used to train a neural network. In the Bayesian framework, this additional term can be interpreted as the log prior to the weights distribution. We propose to use two priors (a Gaussian and a Gaussian mixture) and show that this regularization approach allows to select efficient subsets of variables. Our methods are compared to conventional statistical selection procedures and are shown to significantly improve on that.

[1]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[2]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[3]  Stuart C. Schwartz,et al.  Underwater noises: Statistical modeling, detection, and normalization , 1988 .

[4]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[5]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[6]  Yves Chauvin Dynamic Behavior of Constained Back-Propagation Networks , 1989, NIPS.

[7]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[8]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[9]  J. Habbema,et al.  Selection of Variables in Discriminant Analysis by F-statistic and Error Rate , 1977 .

[10]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[11]  M. Thompson Selection of Variables in Multiple Regression: Part II. Chosen Procedures, Computations and Examples , 1978 .

[12]  Chris Bishop,et al.  Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.

[13]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[14]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[15]  H. Akaike A new look at the statistical model identification , 1974 .

[16]  Alan J. Miller,et al.  Subset Selection in Regression , 1991 .

[17]  E HintonGeoffrey,et al.  Simplifying neural networks by soft weight-sharing , 1992 .

[18]  Patrick Gallinari,et al.  Variable Selection with Optimal Cell Damage , 1994 .

[19]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[20]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[21]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[22]  Yves Chauvin,et al.  A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[23]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[24]  André Kouam,et al.  Approches connexionnistes pour la prevision des series temporelles , 1993 .

[25]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[26]  P. Gallinari,et al.  Cooperation of neural nets and task decomposition , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[27]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[28]  M. Thompson Selection of Variables in Multiple Regression: Part I. A Review and Evaluation , 1978 .

[29]  A. Atkinson Subset Selection in Regression , 1992 .

[30]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[31]  L. S. Feldt,et al.  THE SELECTION OF VARIABLES IN MULTIPLE REGRESSION ANALYSIS , 1970 .

[32]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[33]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[34]  H. H. Thodberg Ace of Bayes : Application of Neural , 1993 .

[35]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.