论文信息 - Online Optimization in X-Armed Bandits

Online Optimization in X-Armed Bandits

We consider a generalization of stochastic bandit problems where the set of arms, Χ, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over Χ in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if Χ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by √n, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.

[1] J. Doob. Stochastic processes , 1953 .

[2] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .

[3] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[4] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[5] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[6] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[7] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.

[8] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[9] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .