论文信息 - Algorithms for Infinitely Many-Armed Bandits

Algorithms for Infinitely Many-Armed Bandits

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matches (up to a logarithmic factor) the upper-bound in some cases.

[1] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .

[2] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .

[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[5] Robert D. Kleinberg,et al. Online decision problems with large strategy sets , 2005 .

[6] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.

[7] S. Gelly,et al. Anytime many-armed bandits , 2007 .

[8] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[9] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[10] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .