Regret Bounds for Deterministic Gaussian Process Bandits

This paper analyses the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret vanishes at the approximate rate of $O(\frac{1}{\sqrt{t}})$, where t is the number of observations. To complement their result, we attack the deterministic case and attain a much faster exponential convergence rate. Under some regularity assumptions, we show that the regret decreases asymptotically according to $O(e^{-\frac{\tau t}{(\ln t)^{d/4}}})$ with high probability. Here, d is the dimension of the search space and $\tau$ is a constant that depends on the behaviour of the objective function near its global maximum.

[1]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[2]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[3]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[4]  Robert B. Gramacy,et al.  Parameter space exploration with Gaussian process trees , 2004, ICML.

[5]  D. Lizotte Practical bayesian optimization , 2008 .

[6]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[7]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[8]  Andreas Christmann,et al.  Support Vector Machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[10]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[11]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[12]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[13]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[14]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[15]  S. Ghosal,et al.  Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[16]  Pierre Hansen,et al.  Global optimization of univariate Lipschitz functions: I. Survey and properties , 1989, Math. Program..

[17]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Pierre Hansen,et al.  Global optimization of univariate Lipschitz functions: II. New algorithms and computational comparison , 1989, Math. Program..

[20]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[21]  R. Adler,et al.  Random Fields and Geometry , 2007 .

[22]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm with fixed mean and covariance functions , 2007, 0712.3744.

[23]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[24]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[25]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[26]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[27]  Nader H. Bshouty,et al.  On Exact Learning Halfspaces with Random Consistent Hypothesis Oracle , 2006, ALT.

[28]  Roman Garnett,et al.  Bayesian optimization for sensor set selection , 2010, IPSN '10.