Heteroscedastic Treed Bayesian Optimisation

Optimising black-box functions is important in many disciplines, such as tuning machine learning models, robotics, finance and mining exploration. Bayesian optimisation is a state-of-the-art technique for the global optimisation of black-box functions which are expensive to evaluate. At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions. However, in many cases a single Gaussian process is not flexible enough to capture non-stationarity in the objective function. Consequently, heteroscedasticity negatively affects performance of traditional Bayesian methods. In this paper, we propose a novel prior model with hierarchical parameter learning that tackles the problem of non-stationarity in Bayesian optimisation. Our results demonstrate substantial improvements in a wide range of applications, including automatic machine learning and mining exploration.

[1]  Ali Jalali,et al.  Hybrid Batch Bayesian Optimization , 2012, ICML.

[2]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[3]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[5]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[6]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[7]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[8]  Kevin Miller,et al.  Max-Margin Min-Entropy Models , 2012, AISTATS.

[9]  Robert B. Gramacy,et al.  Bayesian treed gaussian process models , 2005 .

[10]  Jiangtao Wu,et al.  A Novel Portable Absolute Transient Hot-Wire Instrument for the Measurement of the Thermal Conductivity of Solids , 2015 .

[11]  G. Shaddick,et al.  Modeling Nonstationary Processes Through Dimension Expansion , 2010, 1011.2553.

[12]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[13]  David Higdon,et al.  Non-Stationary Spatial Modeling , 2022, 2212.08043.

[14]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[15]  Matthew W. Hoffman,et al.  An Entropy Search Portfolio for Bayesian Optimization , 2014, ArXiv.

[16]  P. Guttorp,et al.  Nonparametric Estimation of Nonstationary Spatial Covariance Structure , 1992 .

[17]  Nando de Freitas,et al.  Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers , 2013, ICML 2013.

[18]  Richard E. Turner,et al.  Tree-structured Gaussian Process Approximations , 2014, NIPS.

[19]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[20]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[23]  Ryan P. Adams,et al.  Gaussian process product models for nonparametric nonstationarity , 2008, ICML '08.

[24]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[25]  Katharina Eggensperger,et al.  Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters , 2013 .

[26]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[27]  A. O'Hagan,et al.  Bayesian inference for non‐stationary spatial covariance structure via spatial deformations , 2003 .

[28]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[29]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[32]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[33]  Roman Garnett,et al.  Bayesian optimization for sensor set selection , 2010, IPSN '10.

[34]  Misha Denil,et al.  Narrowing the Gap: Random Forests In Theory and In Practice , 2013, ICML.

[35]  David B. Dunson,et al.  Multiresolution Gaussian Processes , 2012, NIPS.

[36]  D. Lizotte Practical bayesian optimization , 2008 .

[37]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[38]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[39]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[40]  Fabio Tozeto Ramos,et al.  Bayesian optimisation for Intelligent Environmental Monitoring , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[42]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[43]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..