论文信息 - Bayesian Methods for Adaptive Models

Bayesian Methods for Adaptive Models

The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non–linear models. This framework quantitatively embodies ‘Occam’s razor’. Over–complex and under– regularised models are automatically inferred to be less probable, even though their flexibility allows them to fit the data better. When applied to ‘neural networks’, the Bayesian framework makes possible (1) objective comparison of solutions using alternative network architectures; (2) objective stopping rules for network pruning or growing procedures; (3) objective choice of type of weight decay terms (or regularisers); (4) on–line techniques for optimising weight decay (or regularisation constant) magnitude; (5) a measure of the effective number of well–determined parameters in a model; (6) quantified estimates of the error bars on network parameters and on network output. In the case of classification models, it is shown that the careful incorporation of error bar information into a classifier’s predictions yields improved performance. Comparisons of the inferences of the Bayesian framework with more traditional cross– validation methods help detect poor underlying assumptions in learning models. The relationship of the Bayesian learning framework to ‘active learning’ is examined. Objective functions are discussed which measure the expected informativeness of candidate data measurements, in the context of both interpolation and classification problems. The concepts and methods described in this thesis are quite general and will be applicable to other data modelling problems whether they involve regression, classification or density estimation.

[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[2] L. M. M.-T.. Theory of Probability , 1929, Nature.

[3] D. Lindley. On a Measure of the Information Provided by an Experiment , 1956 .

[4] G. C. Tiao,et al. A Further Look at Robustness via Bayes's Theorem , 1962 .

[5] G. C. Tiao,et al. A Bayesian approach to the importance of assumptions applied to the comparison of variances , 1964 .

[6] G. C. Tiao,et al. A bayesian approach to some outlier problems. , 1968, Biometrika.

[7] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..

[8] A. M. Walker. On the Asymptotic Behaviour of Posterior Distributions , 1969 .

[9] H. Akaike. Statistical predictor identification , 1970 .

[10] G. C. Tiao,et al. Bayesian inference in statistical analysis , 1973 .

[11] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12] I. Good,et al. Information, weight of evidence, the singularity between probability measures and signal detection , 1974 .

[13] M. Goldstein. Bayesian analysis of regression problems , 1976 .

[14] R. Kashyap. A Bayesian comparison of different classes of dynamic models using empirical data , 1977 .

[15] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[16] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[17] D. Spiegelhalter,et al. Bayes Factors and Choice Criteria for Linear Models , 1980 .

[18] C. S. Wallace,et al. Archaeoastronomy in the Old World: STONE CIRCLE GEOMETRIES: AN INFORMATION THEORY APPROACH , 1982 .

[19] A. Zellner,et al. Basic Issues in Econometrics. , 1986 .

[20] D. Titterington. Common structure of smoothing techniques in statistics , 1985 .

[21] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.

[22] S. Luttrell. The use of transinformation in the design of data sampling schemes for inverse problems , 1985 .

[23] M. F.,et al. Bibliography , 1985, Experimental Gerontology.

[24] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[25] A. R. Davies,et al. Optimisation in the regularisation ill-posed problems , 1986, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.

[26] Geoffrey E. Hinton,et al. Learning representations by back-propagation errors, nature , 1986 .

[27] J. Justice. Maximum entropy and bayesian methods in applied statistics , 1986 .

[28] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[29] S. Stigler. Laplace's 1774 Memoir on Inverse Probability , 1986 .

[30] C. S. Wallace,et al. Estimation and Inference by Compact Coding , 1987 .

[31] David Lindley,et al. Bayesian Statistics, a Review , 1987 .

[32] J J Hopfield,et al. Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[33] Esther Levin,et al. Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[34] S. Gull. Bayesian Inductive Inference and Maximum Entropy , 1988 .

[35] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .

[36] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[37] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.