Bandit view on noisy optimization

This chapter deals with the problem of making the best use of a finite number of noisy evaluations to optimize an unknown function. We are primarily concerned with the case where the function is defined over a finite set. In this discrete setting, we discuss various objectives for the learner, from optimizing the allocation of a given budget of evaluations to optimal stopping time problems with ( , δ)-PAC guarantees. We also consider the so-called online optimization framework, where the result of an evaluation is associated to a reward, and the goal is to maximize the sum of obtained rewards. In this case, we extend the algorithms to continuous sets and (weakly) Lipschitzian functions (with respect to a prespecified metric).

[1]  J. Doob Stochastic processes , 1953 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[4]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[5]  David Williams,et al.  Probability with Martingales , 1991, Cambridge mathematical textbooks.

[6]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[7]  A. Burnetas,et al.  Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[8]  Richard M. Karp,et al.  An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[9]  Leslie Pack Kaelbling,et al.  Sampling Methods for Action Selection in Influence Diagrams , 2000, AAAI/IAAI.

[10]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[11]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[14]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[15]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[16]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[17]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[18]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[19]  Csaba Szepesvári,et al.  Empirical Bernstein stopping , 2008, ICML '08.

[20]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[21]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[22]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[23]  Csaba Szepesvári,et al.  Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[24]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[25]  Sébastien Bubeck Bandits Games and Clustering Foundations , 2010 .

[26]  Jean-Yves Audibert PAC-Bayesian aggregation and multi-armed bandits , 2010 .