论文信息 - The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an e-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

John N. Tsitsiklis | Shie Mannor | Shie Mannor | J. Tsitsiklis

[1] J. Gani,et al. Progress in statistics , 1975 .

[2] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[3] I. Johnstone,et al. ASYMPTOTICALLY OPTIMAL PROCEDURES FOR SEQUENTIAL ADAPTIVE SELECTION OF THE BEST OF SEVERAL NORMAL MEANS , 1982 .

[4] D. Siegmund. Sequential Analysis: Tests and Confidence Intervals , 1985 .

[5] H. Chernoff. Sequential Analysis and Optimal Design , 1987 .

[6] S. Gupta,et al. Statistical decision theory and related topics IV , 1988 .

[7] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[8] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .

[9] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[10] Sanjeev R. Kulkarni,et al. Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..

[11] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .

[12] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[14] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .