Competing Bandits: Learning Under Competition

Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing systems. In our model, we consider competition between two multi-armed bandit algorithms faced with the same bandit instance. Users arrive one by one and choose among the two algorithms, so that each algorithm makes progress if and only if it is chosen. We ask whether and to what extent competition incentivizes the adoption of better bandit algorithms. We investigate this issue for several models of user response, as we vary the degree of rationality and competitiveness in the model. Our findings are closely related to the "competition vs. innovation" relationship, a well-studied theme in economics.

[1]  H. Hotelling Stability in Competition , 1929 .

[2]  M. Lessnoff Capitalism, Socialism and Democracy , 1979 .

[3]  Nancy L. Stokey,et al.  Information, Trade, and Common Knowledge , 1982 .

[4]  J. Perloff,et al.  Equilibrium with Product Differentiation , 1984 .

[5]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[6]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[7]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8]  M. Cripps,et al.  Strategic Experimentation with Exponential Bandits , 2003 .

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  X. Vives Innovation and Competitive Pressure , 2004 .

[11]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[12]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[13]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[14]  Ilya Segal,et al.  An Efficient Dynamic Mechanism , 2013 .

[15]  P. Dasgupta,et al.  Equilibrium in Competitive Insurance Markets : An Essay on the Economics of Imperfect Information , 2007 .

[16]  D. Bergemann,et al.  The Dynamic Pivot Mechanism , 2008 .

[17]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[18]  Thorsten Joachims,et al.  The K-armed Dueling Bandits Problem , 2012, COLT.

[19]  Marc Rysman The Economics of Two-Sided Markets , 2009 .

[20]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[21]  Moshe Babaioff,et al.  Truthful mechanisms with implicit payment computation , 2010, EC '10.

[22]  Adam Tauman Kalai,et al.  Dueling algorithms , 2011, STOC '11.

[23]  Jia Yuan Yu,et al.  Mean field equilibria of multi armed bandit games , 2012, Allerton Conference.

[24]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[25]  Patrick Hummel,et al.  Learning and incentives in user-generated content: multi-armed bandits with endogenous arms , 2013, ITCS '13.

[26]  Yishay Mansour,et al.  Implementing the “Wisdom of the Crowd” , 2013, Journal of Political Economy.

[27]  Sham M. Kakade,et al.  Optimal Dynamic Mechanism Design and the Virtual Pivot Mechanism , 2013, Oper. Res..

[28]  Andreas Krause,et al.  Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[29]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[30]  E. Weyl,et al.  Let the Right 'One' Win: Policy Lessons from the New Economics of Platforms , 2014 .

[31]  Aleksandrs Slivkins,et al.  Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems , 2014, J. Artif. Intell. Res..

[32]  Jon M. Kleinberg,et al.  Incentivizing exploration , 2014, EC.

[33]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2015, EC.

[34]  Yeon-Koo Che,et al.  Optimal Design for Social Learning , 2015 .

[35]  Eduardo M. Azevedo,et al.  Perfect Competition in Markets with Adverse Selection , 2015 .

[36]  Moshe Tennenholtz,et al.  Economic Recommendation Systems , 2015, ArXiv.

[37]  E. Glen Weyl,et al.  Descending Price Optimally Coordinates Search , 2016, EC.

[38]  Hongyi Li,et al.  The impact of competition on prices with numerous firms , 2016, J. Econ. Theory.

[39]  Yishay Mansour,et al.  Bayesian Exploration: Incentivizing Exploration in Bayesian Games , 2016, EC.

[40]  E. Weyl,et al.  Product Design in Selection Markets , 2015 .

[41]  Aleksandrs Slivkins,et al.  Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems , 2016, J. Artif. Intell. Res..

[42]  Kostas Bimpikis,et al.  Crowdsourcing Exploration , 2018, Manag. Sci..

[43]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[44]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .