Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters

Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Bayesian optimisation with Gaussian processes and unknown kernel hyper-parameters in the stochastic setting. The bound, which applies to the expected improvement acquisition function and sub-Gaussian observation noise, provides us with guidelines on how to design hyper-parameter estimation methods. A simple simulation demonstrates the importance of following these guidelines.

[1]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[2]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[3]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[4]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[5]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[6]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[7]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[10]  Roman Garnett,et al.  Bayesian optimization for sensor set selection , 2010, IPSN '10.

[11]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[12]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[13]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[14]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[15]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[16]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[17]  Julien Bect,et al.  Robust Gaussian Process-Based Global Optimization Using a Fully Bayesian Expected Improvement Criterion , 2011, LION.

[18]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[19]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[20]  Fabio Tozeto Ramos,et al.  Bayesian optimisation for Intelligent Environmental Monitoring , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Nando de Freitas,et al.  Adaptive MCMC with Bayesian Optimization , 2012, AISTATS.

[22]  Ali Jalali,et al.  Hybrid Batch Bayesian Optimization , 2012, ICML.

[23]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[24]  Nando de Freitas,et al.  Bayesian optimization in high dimensions via random embeddings , 2013, IJCAI 2013.

[25]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[26]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[27]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[28]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.