Evaluation of Gaussian processes and other methods for non-linear regression

This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are accounted for. The framework allows for estimation of generalisation performance as well as statistical tests of significance for pairwise comparisons. Two experimental designs are recommended and supported by the DELVE software environment. Two new non-parametric Bayesian learning methods relying on Gaussian process priors over functions are developed. These priors are controlled by hyperparameters which set the characteristic length scale for each input dimension. In the simplest method, these parameters are fit from the data using optimization. In the second, fully Bayesian method, a Markov chain Monte Carlo technique is used to integrate over the hyperparameters. One advantage of these Gaussian process methods is that the priors and hyperparameters of the trained models are easy to interpret. The Gaussian process methods are benchmarked against several other methods, on regression tasks using both real data and data generated from realistic simulations. The experiments show that small datasets are unsuitable for benchmarking purposes because the uncertainties in performance measurements are large. A second set of experiments provide strong evidence that the bagging procedure is advantageous for the Multivariate Adaptive Regression Splines (MARS) method. The simulated datasets have controlled characteristics which make them useful for understanding the relationship between properties of the dataset and the performance of different methods. The dependency of the performance on available computation time is also investigated. It is shown that a Bayesian approach to learning in multi-layer perceptron neural networks achieves better performance than the commonly used early stopping procedure, even for reasonably short amounts of computation time. The Gaussian process methods are shown to consistently outperform the more conventional methods.

[1]  P. C. Gehlen,et al.  Computer Experiments , 1996 .

[2]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[3]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[4]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[5]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[6]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[7]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[8]  R. Fletcher Practical Methods of Optimization , 1988 .

[9]  P. Diaconis Bayesian Numerical Analysis , 1988 .

[10]  J. Skilling The Eigenvalues of Mega-dimensional Matrices , 1989 .

[11]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[12]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[13]  G. Wahba Spline models for observational data , 1990 .

[14]  A. Horowitz A generalized guided Monte Carlo algorithm , 1991 .

[15]  Harold R. Lindman Analysis of Variance in Experimental Design , 1991 .

[16]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[17]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[18]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[19]  Radford M. Neal An improved acceptance procedure for the hybrid Monte Carlo algorithm , 1992, hep-lat/9208011.

[20]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[21]  Gregg D. Wilensky,et al.  Neural Network Studies , 1993 .

[22]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[23]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[24]  N. Bodor,et al.  Neural network studies: Part 3. Prediction of partition coefficients , 1994 .

[25]  Lutz Prechelt,et al.  A study of experimental evaluations of neural network learning algorithms: current research practice , 1994 .

[26]  Lutz Prechelt,et al.  A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[27]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[28]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[29]  Carl E. Rasmussen,et al.  A Practical Monte Carlo Implementation of Bayesian Learning , 1995, NIPS.

[30]  Lutz Prechelt Konstruktive neuronale Lernverfahren auf Parallelrechnern , 1995 .

[31]  Lars Kai Hansen,et al.  Empirical generalization assessment of neural network models , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[32]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[33]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[34]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[35]  Anne Guerin-dugue,et al.  Deliverable R3-B4-P-Task B4: Benchmarks , 1995 .

[36]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[37]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[38]  Hans Henrik Thodberg,et al.  A review of Bayesian neural networks with an application to near infrared spectroscopy , 1996, IEEE Trans. Neural Networks.

[39]  Geoffrey E. Hinton,et al.  The delve manual , 1996 .

[40]  Peter I. Corke,et al.  A robotics toolbox for MATLAB , 1996, IEEE Robotics Autom. Mag..

[41]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[42]  Christopher K. I. Williams Regression with Gaussian processes , 1997 .

[43]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[44]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .