论文信息 - Committing Bandits

Committing Bandits

We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T) steps and then commit, where T is the time horizon.

[1] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[2] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[3] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[4] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[5] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[6] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.

[7] Shie Mannor,et al. k-Armed Bandit , 2010, Encyclopedia of Machine Learning.

[8] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .

[11] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[12] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[13] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[14] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .