Preferential Bayesian Optimization

Bayesian optimization (BO) has emerged during the last few years as an effective approach to optimizing black-box functions where direct queries of the objective are expensive. In this paper we consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) which allows us to find the optimum of a latent function that can only be queried through pairwise comparisons, the so-called duels. PBO extends the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the winner of each duel by means of a Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments, showing that PBO needs drastically fewer comparisons for finding the optimum. According to our experiments, the way of modeling correlations in PBO is key in obtaining this advantage.

[1]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[2]  Alfred Kobsa,et al.  The Adaptive Web, Methods and Strategies of Web Personalization , 2007, The Adaptive Web.

[3]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[5]  Huasen Wu,et al.  Double Thompson Sampling for Dueling Bandits , 2016, NIPS.

[6]  Neil D. Lawrence,et al.  GLASSES: Relieving The Myopia Of Bayesian Optimisation , 2015, AISTATS.

[7]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[8]  Neil D. Lawrence,et al.  Batch Bayesian Optimization via Local Penalization , 2015, AISTATS.

[9]  Katja Hofmann,et al.  Contextual Dueling Bandits , 2015, COLT.

[10]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[11]  Thorsten Joachims,et al.  Beat the Mean Bandit , 2011, ICML.

[12]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum and their Application , 1977, IFIP Congress.

[13]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[14]  M. de Rijke,et al.  Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.

[15]  Thorsten Joachims,et al.  Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[16]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[17]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[18]  M. de Rijke,et al.  Copeland Dueling Bandits , 2015, NIPS.

[19]  Eyke Hüllermeier,et al.  Online Rank Elicitation for Plackett-Luce: A Dueling Bandits Approach , 2015, NIPS.

[20]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[21]  Robert D. Nowak,et al.  Sparse Dueling Bandits , 2015, AISTATS.