Mistake bounds on the noise-free multi-armed bandit game

Abstract We study the { 0 , 1 } -loss version of adaptive adversarial multi-armed bandit problems with α ( ≥ 1 ) lossless arms. For the problem, we show a tight bound K − α − Θ ( 1 / T ) on the minimax expected number of mistakes (1-losses), where K is the number of arms and T is the number of rounds.

[1]  F. R. Rosendaal,et al.  Prediction , 2015, Journal of thrombosis and haemostasis : JTH.

[2]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[3]  Atsuyoshi Nakamura,et al.  Noise Free Multi-armed Bandit Game , 2015, LATA.

[4]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[5]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[6]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..