论文信息 - Optimal Brain Damage

Optimal Brain Damage

We have used information-theoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative information to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.

[1] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2] Hirotugu Akaike,et al. Use of statistical models for time series analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Lawrence D. Jackel,et al. Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[4] Yann LeCun. PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[5] Yann LeCun,et al. Modeles connexionnistes de l'apprentissage , 1987 .

[6] Yves Chauvin,et al. A Back-Propagation Algorithm with Optimal Use of Hidden Units , 1988, NIPS.

[7] Michael C. Mozer,et al. Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[8] Lorien Y. Pratt,et al. Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[9] Yann LeCun,et al. Generalization and network design strategies , 1989 .

[10] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[11] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .

[12] Vladimir Vapnik,et al. Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[13] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[14] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15] Alex Waibel,et al. Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.