论文信息 - Best of both worlds: Stochastic & adversarial best-arm identification - 字舞流文

Best of both worlds: Stochastic & adversarial best-arm identification

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

Peter L. Bartlett | Michal Valko | Yasin Abbasi-Yadkori | Alan Malek | Victor Gabillon | P. Bartlett | Victor Gabillon | Michal Valko | Yasin Abbasi-Yadkori | Alan Malek

[1] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[2] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[3] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[4] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[5] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[6] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[7] Peter Auer,et al. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.

[8] David M. W. Powers,et al. Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[9] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[10] Michal Valko,et al. Extreme bandits , 2014, NIPS.

[11] Alexandra Carpentier,et al. Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem , 2016, COLT.

[12] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[13] Ameet Talwalkar,et al. Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.

[14] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[15] Raphaël Féraud,et al. The non-stationary stochastic multi-armed bandit problem , 2017, International Journal of Data Science and Analytics.

[16] Alexandra Carpentier,et al. An optimal algorithm for the Thresholding Bandit Problem , 2016, ICML.

[17] D. Freedman. On Tail Probabilities for Martingales , 1975 .

[18] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[19] Stefano Ermon,et al. Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[20] Gábor Lugosi,et al. An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.

[21] Alan Malek,et al. Best Arm Identification for Contaminated Bandits , 2018, J. Mach. Learn. Res..

[22] Raphaël Féraud,et al. Selection of learning experts , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[23] R. Kaas,et al. Mean, Median and Mode in Binomial Distributions , 1980 .

[24] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[25] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[26] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.

[27] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[28] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.