Taking the Human Out of the Loop: A Review of Bayesian Optimization

Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involve many tunable configuration parameters. These parameters are often specified and hard-coded into the software by various developers or teams. If optimized jointly, these parameters can result in significant improvements. Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years. It promises greater automation so as to increase both product quality and human productivity. This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  D. Krige A statistical approach to some basic mine valuation problems on the Witwatersrand, by D.G. Krige, published in the Journal, December 1951 : introduction by the author , 1951 .

[3]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[4]  R. V. Churchill,et al.  Lectures on Fourier Integrals , 1959 .

[5]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[6]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[7]  P. Guttorp,et al.  Nonparametric Estimation of Nonstationary Spatial Covariance Structure , 1992 .

[8]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[9]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[10]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[11]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[12]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[13]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[14]  Douglas M. Bates,et al.  Unconstrained parametrizations for variance-covariance matrices , 1996, Stat. Comput..

[15]  Marco Locatelli,et al.  Bayesian Algorithms for One-Dimensional Global Optimization , 1997, J. Glob. Optim..

[16]  William J. Welch,et al.  Computer experiments and global optimization , 1997 .

[17]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[18]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[19]  David Higdon,et al.  Non-Stationary Spatial Modeling , 2022, 2212.08043.

[20]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[21]  Thomas J. Santner,et al.  Sequential design of computer experiments to minimize integrated response functions , 2000 .

[22]  Charles Audet,et al.  A surrogate-model-based method for constrained optimization , 2000 .

[23]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[24]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[25]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[26]  Michael James Sasena,et al.  Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. , 2002 .

[27]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[28]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[29]  A. ilinskas,et al.  Global optimization based on a statistical model and simplicial partitioning , 2002 .

[30]  Yoav Shoham,et al.  Learning the Empirical Hardness of Optimization Problems: The Case of Combinatorial Auctions , 2002, CP.

[31]  A. Zilinskas,et al.  Global optimization based on a statistical model and simplicial partitioning , 2002 .

[32]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[33]  Mark J. Schervish,et al.  Nonstationary Covariance Functions for Gaussian Process Regression , 2003, NIPS.

[34]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[35]  A. O'Hagan,et al.  Bayesian inference for non‐stationary spatial covariance structure via spatial deformations , 2003 .

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[38]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[39]  Russell Greiner,et al.  Active Model Selection , 2004, UAI.

[40]  Thomas Bartz-Beielstein,et al.  Sequential parameter optimization , 2005, 2005 IEEE Congress on Evolutionary Computation.

[41]  Alexander J. Smola,et al.  Heteroscedastic Gaussian process regression , 2005, ICML.

[42]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[43]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[44]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[45]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[46]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[47]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[48]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[49]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[50]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[51]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[52]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[53]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[54]  Phillip Boyle,et al.  Gaussian Processes for Regression and Optimisation , 2007 .

[55]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[56]  Ryan P. Adams,et al.  Gaussian process product models for nonparametric nonstationarity , 2008, ICML '08.

[57]  D. Lizotte Practical bayesian optimization , 2008 .

[58]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[59]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[60]  M. Stein,et al.  Estimating deformations of isotropic Gaussian random fields on the plane , 2008, 0804.0723.

[61]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[62]  Frank Hutter,et al.  Automated configuration of algorithms for solving hard computational problems , 2009 .

[63]  Eric Walter,et al.  An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[64]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[65]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[66]  Daniel Busby,et al.  Hierarchical adaptive experimental design for Gaussian process emulators , 2009, Reliab. Eng. Syst. Saf..

[67]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[68]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[69]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[70]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[71]  D. Ginsbourger,et al.  Dealing with asynchronicity in parallel Gaussian Process based global optimization , 2010 .

[72]  Aníbal R. Figueiras-Vidal,et al.  Marginalized Neural Network Mixtures for Large-Scale Regression , 2010, IEEE Transactions on Neural Networks.

[73]  Steven Reece,et al.  Sequential Bayesian Prediction in the Presence of Changepoints and Faults , 2010, Comput. J..

[74]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm with fixed mean and covariance functions , 2007, 0712.3744.

[75]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[76]  Balázs Kégl,et al.  Surrogating the surrogate: accelerating Gaussian-process-based global optimization with a mixture cross-entropy algorithm , 2010, ICML.

[77]  Roman Garnett,et al.  Bayesian optimization for sensor set selection , 2010, IPSN '10.

[78]  D. Ginsbourger,et al.  Kriging is well-suited to parallelize optimization , 2010 .

[79]  John Shawe-Taylor,et al.  Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.

[80]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[81]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[82]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[83]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[84]  G. Shaddick,et al.  Modeling Nonstationary Processes Through Dimension Expansion , 2010, 1011.2553.

[85]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[86]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[87]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[88]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[89]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[90]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[91]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[92]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[93]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[94]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[95]  Alessandro Lazaric,et al.  Multi-Bandit Best Arm Identification , 2011, NIPS.

[96]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[97]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[98]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[99]  Julien Bect,et al.  Robust Gaussian Process-Based Global Optimization Using a Fully Bayesian Expected Improvement Criterion , 2011, LION.

[100]  D. Lizotte,et al.  An experimental methodology for response surface optimization methods , 2012, J. Glob. Optim..

[101]  Holger H. Hoos,et al.  Programming by optimization , 2012, Commun. ACM.

[102]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[103]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[104]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[105]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[106]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[107]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[108]  Alexander J. Smola,et al.  Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations , 2012, ICML.

[109]  Fabio Tozeto Ramos,et al.  Bayesian optimisation for Intelligent Environmental Monitoring , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[110]  Kevin Leyton-Brown,et al.  Parallel Algorithm Configuration , 2012, LION.

[111]  Misha Denil,et al.  Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[112]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[113]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[114]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[115]  Andreas Krause,et al.  Joint Optimization and Variable Selection of High-dimensional Gaussian Processes , 2012, ICML.

[116]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[117]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[118]  Nando de Freitas,et al.  Adaptive MCMC with Bayesian Optimization , 2012, AISTATS.

[119]  Ali Jalali,et al.  Hybrid Batch Bayesian Optimization , 2012, ICML.

[120]  Scott Clark,et al.  Parallel Machine Learning Algorithms In Bioinformatics And Global Optimization , 2012 .

[121]  Kevin Leyton-Brown,et al.  Identifying Key Algorithm Parameters and Instance Features Using Forward Selection , 2013, LION.

[122]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[123]  Nando de Freitas,et al.  Bayesian optimization in high dimensions via random embeddings , 2013, IJCAI 2013.

[124]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[125]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[126]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[127]  Victor Picheny,et al.  A Nonstationary Space-Time Gaussian Process Model for Partially Converged Simulations , 2013, SIAM/ASA J. Uncertain. Quantification.

[128]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[129]  Nando de Freitas,et al.  Self-Avoiding Random Dynamics on Integer Complex Systems , 2011, TOMC.

[130]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[131]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[132]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[133]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[134]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[135]  Jasper Snoek,et al.  Bayesian Optimization and Semiparametric Models with Applications to Assistive Technology , 2014 .

[136]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[137]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[138]  Sébastien Le Digabel,et al.  Modeling an augmented Lagrangian for improved blackbox constrained optimization , 2014 .

[139]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[140]  Jasper Snoek,et al.  Freeze-Thaw Bayesian Optimization , 2014, ArXiv.

[141]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[142]  James J. Little,et al.  Bayesian Optimization with an Empirical Hardness Model for approximate Nearest Neighbour Search , 2014, IEEE Winter Conference on Applications of Computer Vision.

[143]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[144]  Andreas Krause,et al.  Explore-exploit in top-N recommender systems via Gaussian processes , 2014, RecSys '14.

[145]  Jost Tobias Springenberg,et al.  Extrapolating Learning Curves of Deep Neural Networks , 2014 .

[146]  Nando de Freitas,et al.  On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.

[147]  Michael A. Osborne,et al.  Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces , 2014, 1409.4011.

[148]  Nando de Freitas,et al.  Heteroscedastic Treed Bayesian Optimisation , 2014, ArXiv.

[149]  Andrew Gordon Wilson,et al.  Student-t Processes as Alternatives to Gaussian Processes , 2014, AISTATS.

[150]  Roman Garnett,et al.  Active Learning of Linear Embeddings for Gaussian Processes , 2013, UAI.

[151]  Vianney Perchet,et al.  Gaussian Process Optimization with Mutual Information , 2013, ICML.

[152]  Yuting Zhang,et al.  Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[153]  Dani Yogatama,et al.  Bayesian Optimization of Text Representations , 2015, EMNLP.

[154]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[155]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[156]  Frank Hutter,et al.  Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[157]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[158]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[159]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .