Bandits with Movement Costs and Adaptive Pricing

We extend the model of Multi-armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval $[0,1]$ and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{kT} + T/k)$, where $k$ is the number of actions and $T$ is the time horizon. When the set of actions corresponds to whole $[0,1]$ interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of $\widetilde{\Theta}(T^{2/3})$, which is the same rate one obtains when there is no penalty for movements. As our main application, we use our new algorithm to solve an adaptive pricing problem. Specifically, we consider the case of a single seller faced with a stream of patient buyers. Each buyer has a private value and a window of time in which they are interested in buying, and they buy at the lowest price in the window, if it is below their value. We show that with an appropriate discretization of the prices, the seller can achieve a regret of $\widetilde{O}(T^{2/3})$ compared to the best fixed price in hindsight, which outperform the previous regret bound of $\widetilde{O}(T^{3/4})$ for the problem.

[1]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[2]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[3]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[4]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[5]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[6]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[7]  Aleksandrs Slivkins,et al.  Sharp dichotomies for regret minimization in metric spaces , 2009, SODA '10.

[8]  Aleksandrs Slivkins,et al.  Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[9]  Roi Livni,et al.  Online Pricing with Strategic and Patient Buyers , 2016, NIPS.

[10]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[11]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[12]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[13]  Elad Hazan,et al.  The Blinded Bandit: Learning with Adaptive Feedback , 2014, NIPS.

[14]  Sudipto Guha,et al.  Multi-armed Bandits with Metric Switching Costs , 2009, ICALP.

[15]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[16]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[17]  Peter L. Bartlett,et al.  Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[18]  Shie Mannor,et al.  Unimodal Bandits , 2011, ICML.

[19]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[20]  Atri Rudra,et al.  Dynamic pricing for impatient bidders , 2007, SODA '07.

[21]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[22]  Maria-Florina Balcan,et al.  Sequential Item Pricing for Unlimited Supply , 2010, WINE.

[23]  Yuval Peres,et al.  Bandits with switching costs: T2/3 regret , 2013, STOC.

[24]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[25]  Maria-Florina Balcan,et al.  Approximation Algorithms and Online Mechanisms for Item Pricing , 2007, Theory Comput..

[26]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..