论文信息 - Bandit view on noisy optimization - 字舞流文

Bandit view on noisy optimization

This chapter deals with the problem of making the best use of a finite number of noisy evaluations to optimize an unknown function. We are primarily concerned with the case where the function is defined over a finite set. In this discrete setting, we discuss various objectives for the learner, from optimizing the allocation of a given budget of evaluations to optimal stopping time problems with ( , δ)-PAC guarantees. We also consider the so-called online optimization framework, where the result of an evaluation is associated to a reward, and the goal is to maximize the sum of obtained rewards. In this case, we extend the algorithms to continuous sets and (weakly) Lipschitzian functions (with respect to a prespecified metric).

Rémi Munos | Sébastien Bubeck | Jean-Yves Audibert | R. Munos | Jean-Yves Audibert | Sébastien Bubeck

[1] J. Doob. Stochastic processes , 1953 .

[2] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3] D. Freedman. On Tail Probabilities for Martingales , 1975 .

[4] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[5] David Williams,et al. Probability with Martingales , 1991, Cambridge mathematical textbooks.

[6] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[7] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[8] Richard M. Karp,et al. An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[9] Leslie Pack Kaelbling,et al. Sampling Methods for Action Selection in Influence Diagrams , 2000, AAAI/IAAI.

[10] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .

[11] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13] Osamu Watanabe,et al. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[14] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[15] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[16] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.

[17] P. Massart,et al. Concentration inequalities and model selection , 2007 .

[18] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[19] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.

[20] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.

[21] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[22] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[23] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[24] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[25] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .

[26] Jean-Yves Audibert. PAC-Bayesian aggregation and multi-armed bandits , 2010 .