Bayesian Learning for Neural Networks

Artificial "neural networks" are widely used as flexible models for classification and regression applications, but questions remain about how the power of these models can be safely exploited when training data is limited. This book demonstrates how Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional training methods. Insight into the nature of these complex Bayesian models is provided by a theoretical investigation of the priors over functions that underlie them. A practical implementation of Bayesian neural network learning using Markov chain Monte Carlo methods is also described, and software for it is freely available over the Internet. Presupposing only basic knowledge of probability and statistics, this book should be of interest to researchers in statistics, engineering, and artificial intelligence.

[1]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  Physics Letters , 1962, Nature.

[4]  Samuel A. Schmitt Measuring Uncertainty: An Elementary Introduction to Bayesian Statistics , 1969 .

[5]  M. Degroot Optimal Statistical Decisions , 1970 .

[6]  Frank E. Grubbs,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[7]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[8]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[9]  C. D. Litton,et al.  Comparative Statistical Inference. , 1975 .

[10]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[11]  A. Young A Bayesian approach to prediction using polynomials , 1977 .

[12]  J. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[13]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[14]  H. C. Andersen Molecular dynamics simulations at constant pressure and/or temperature , 1980 .

[15]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[17]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[20]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[21]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[22]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[23]  Heinz-Otto Peitgen,et al.  The science of fractal images , 2011 .

[24]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[25]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[26]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[27]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[28]  Paul B. Mackenze An Improved Hybrid Monte Carlo Method , 1989 .

[29]  D. Toussaint Introduction to algorithms for Monte Carlo simulations and their application to QCD , 1989 .

[30]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[31]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[32]  Creutz,et al.  Higher-order hybrid Monte Carlo algorithms. , 1989, Physical review letters.

[33]  A. Kennedy The theory of hybrid stochastic algorithms , 1990 .

[34]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[35]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[36]  F. Guess Bayesian Statistics: Principles, Models, and Applications , 1990 .

[37]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[38]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[39]  Pierre Baldi,et al.  Temporal Evolution of Generalization during Learning in Linear Networks , 1991, Neural Computation.

[40]  David H. Wolpert,et al.  On the Use of Evidence in Neural Networks , 1992, NIPS.

[41]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[42]  D. Y. Yoon,et al.  Novel molecular dynamics simulations at constant pressure , 1992 .

[43]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[44]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[45]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[46]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[47]  Radford M. Neal Bayesian Learning via Stochastic Dynamics , 1992, NIPS.

[48]  L. Tierney Exploring Posterior Distributions Using Markov Chains , 1992 .

[49]  Radford M. Neal An improved acceptance procedure for the hybrid Monte Carlo algorithm , 1992, hep-lat/9208011.

[50]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[51]  Timothy Masters,et al.  Multilayer Feedforward Networks , 1993 .

[52]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[53]  Carlo,et al.  Bayesian Training of Backpropagation Networks by theHybrid Monte , 1993 .

[54]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[55]  Yong Liu,et al.  Robust Parameter Estimation and Model Selection for Neural Network Regression , 1993, NIPS.

[56]  H. H. Thodberg Ace of Bayes : Application of Neural , 1993 .

[57]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[58]  D. Signorini,et al.  Neural networks , 1995, The Lancet.