论文信息 - Minimax Games with Bandits

Minimax Games with Bandits

One of the earliest online learning games, now commonly known as the hedge setting [Freund and Schapire, 1997], goes as follows. On round t, a Learner chooses a distribution wt over a set of n actions, an Adversary reveals `t ∈ [0, 1], a vector of losses for each action, and the Learner suffers wt · `t = ∑n i=1 wt,i`t,i. Freund and Schapire [1997] showed that a very simple strategy of exponentially weighting the actions according to their cumulative losses provides a near-optimal guarantee. That is, by setting

Manfred K. Warmuth | Jacob D. Abernethy

[1] Manfred K. Warmuth,et al. The Minimax Strategy for Gaussian Density Estimation. pp , 2000, COLT.

[2] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[3] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.

[4] E. Takimoto,et al. The Minimax Strategy for Gaussian Density Estimation , 2000 .

[5] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[6] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7] Ambuj Tewari,et al. Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[8] Manfred K. Warmuth,et al. When Random Play is Optimal Against an Adversary , 2008, COLT.

[9] John Langford,et al. Continuous Experts and the Binning Algorithm , 2006, COLT.