Unimodal Bandits

We consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a graph, where the graph structure represents similarity in rewards. The unimodality assumption has an important advantage: we can determine if a given arm is optimal by sampling the possible directions around it. This property allows us to quickly and efficiently find the optimal arm and detect abrupt changes in the reward distributions. For the case of bandits on graphs, we incur a regret proportional to the maximal degree and the diameter of the graph, instead of the total number of vertices.

[1]  Michèle Sebag,et al.  Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .

[2]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[3]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008 .

[4]  Karsten Weihe,et al.  Pareto Shortest Paths is Often Feasible in Practice , 2001, WAE.

[5]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[6]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[7]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[8]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[9]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[10]  Adam Meyerson,et al.  Online oblivious routing , 2003, SPAA '03.

[11]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[12]  Shie Mannor,et al.  Piecewise-stationary bandit problems with side observations , 2009, ICML '09.

[13]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[14]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[17]  Claudio Gentile,et al.  Fast and Optimal Prediction on a Labeled Tree , 2009, COLT.

[18]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[19]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[20]  Claudio Gentile,et al.  Learning Unknown Graphs , 2009, ALT.

[21]  H. Varian Online Ad Auctions , 2009 .

[22]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[23]  Jeremy P. Spinrad,et al.  Efficient graph representations , 2003, Fields Institute monographs.