Bayesian Back-Propagation

Connectionist feed-forward networks, t rained with backpropagat ion, can be used both for nonlinear regression and for (discrete one-of-C ) classification. This paper presents approximate Bayesian meth ods to statistical components of back-propagat ion: choosing a cost funct ion and penalty term (interpreted as a form of prior probability), pruning insignifican t weights, est imat ing the uncertainty of weights, predict ing for new pat terns ("out -of-sample") , est imating the uncertainty in the choice of this predict ion ("erro r bars" ), estimating the generalizat ion erro r, comparing different network st ructures, and handling missing values in the t raining patterns. These methods extend some heurist ic techniques suggested in the literature, and in most cases require a small addit ional facto r in comput at ion during back-propagat ion, or computation once back-pro pagat ion has finished.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[3]  Eric B. Baum,et al.  Supervised Learning of Probability Distributions by Neural Networks , 1987, NIPS.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[5]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[6]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[7]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[8]  Esther Levin,et al.  A statistical approach to learning and generalization in layered neural networks , 1989, COLT '89.

[9]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[10]  M. Ishikawa,et al.  A structural learning algorithm with forgetting of link weights , 1989, International 1989 Joint Conference on Neural Networks.

[11]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[12]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[13]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[14]  H. Sebastian Seung,et al.  Learning curves in large neural networks , 1991, COLT '91.

[15]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.