The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an e-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

[1]  J. Gani,et al.  Progress in statistics , 1975 .

[2]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[3]  I. Johnstone,et al.  ASYMPTOTICALLY OPTIMAL PROCEDURES FOR SEQUENTIAL ADAPTIVE SELECTION OF THE BEST OF SEVERAL NORMAL MEANS , 1982 .

[4]  D. Siegmund Sequential Analysis: Tests and Confidence Intervals , 1985 .

[5]  H. Chernoff Sequential Analysis and Optimal Design , 1987 .

[6]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[7]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[8]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[9]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[10]  Sanjeev R. Kulkarni,et al.  Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..

[11]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .