论文信息 - Bayesian Regularization and Pruning Using a Laplace Prior

Bayesian Regularization and Pruning Using a Laplace Prior

Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruningin the sense of setting weights to exact zerosbecomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

Peter M. Williams | P. Williams

[1] W. Dearborn. Experiments in learning. , 1910 .

[2] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[3] Philip E. Gill,et al. Practical optimization , 1981 .

[4] Geoffrey E. Hinton,et al. Experiments on Learning by Back Propagation. , 1986 .

[5] Roger Fletcher,et al. Practical methods of optimization; (2nd ed.) , 1987 .

[6] Lawrence D. Jackel,et al. Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[7] R. Fletcher. Practical Methods of Optimization , 1988 .

[8] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[9] M. Møller. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[10] David E. Rumelhart,et al. Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[11] Geoffrey E. Hinton,et al. Adaptive Soft Weight Tying using Gaussian Mixtures , 1991, NIPS.