论文信息 - Multi-Armed Bandits with Metric Movement Costs

Multi-Armed Bandits with Metric Movement Costs

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

Roi Livni | Yishay Mansour | Tomer Koren

[1] Aleksandrs Slivkins,et al. Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[2] Satish Rao,et al. A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[3] Filip Radlinski,et al. Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[4] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[5] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[6] Shie Mannor,et al. Unimodal Bandits , 2011, ICML.

[7] Ronald Ortner,et al. Online Regret Bounds for Markov Decision Processes with Deterministic Transitions , 2008, ALT.

[8] J. Banks,et al. Switching Costs and the Gittins Index , 1994 .

[9] Demosthenis Teneketzis,et al. Multi-armed bandits with switching penalties , 1996, IEEE Trans. Autom. Control..

[10] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[11] Sudipto Guha,et al. Multi-armed Bandits with Metric Switching Costs , 2009, ICALP.

[12] András György,et al. Near-Optimal Rates for Limited-Delay Universal Lossy Source Coding , 2014, IEEE Transactions on Information Theory.

[13] Yuval Peres,et al. Bandits with switching costs: T2/3 regret , 2013, STOC.

[14] Aleksandrs Slivkins,et al. Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[15] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..

[16] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[17] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[18] Alexandre Proutière,et al. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.

[19] Allan Borodin,et al. An optimal on-line algorithm for metrical task system , 1992, JACM.

[20] Yair Bartal,et al. Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[21] DE Economist. A SURVEY ON THE BANDIT PROBLEM WITH SWITCHING COSTS , 2004 .

[22] Allan Borodin,et al. Online computation and competitive analysis , 1998 .

[23] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.

[24] Berthold Vöcking,et al. Regret Minimization for Online Buffering Problems Using the Weighted Majority Algorithm , 2010, Electron. Colloquium Comput. Complex..

[25] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .

[26] Roi Livni,et al. Online Pricing with Strategic and Patient Buyers , 2016, NIPS.

[27] Roi Livni,et al. Bandits with Movement Costs and Adaptive Pricing , 2017, COLT.

[28] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..

[29] D. Teneketzis,et al. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[30] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[31] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[32] Csaba Szepesvári,et al. –armed Bandits , 2022 .