Learning to coordinate without communication in multi-user multi-armed bandit problems

We consider a setting where multiple users share multiple channels modeled as a multi-user multi-armed bandit (MAB) problem. The characteristics of each channel are initially unknown and may differ between the users. Each user can choose between the channels, but her success depends on the particular channel as well as on the selections of other users: if two users select the same channel their messages collide and none of them manages to send any data. Our setting is fully distributed, so there is no central control and every user only observes the channel she currently uses. As in many communication systems such as cognitive radio networks, the users cannot communicate among themselves so coordination must be achieved without direct communication. We develop algorithms for learning a stable configuration for the multiple user MAB problem. We further offer both convergence guarantees and experiments inspired by real communication networks.

[1]  Amir Leshem,et al.  Multichannel Opportunistic Carrier Sensing for Stable Channel Access Control in Cognitive Radio Systems , 2012, IEEE Journal on Selected Areas in Communications.

[2]  Wassim Jouini,et al.  Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).

[3]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[4]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Ran Giladi,et al.  Distributed Weighted Stable Marriage Problem , 2010, SIROCCO.

[7]  Zina Chkirbene,et al.  A survey on spectrum management in cognitive radio networks , 2015, Int. J. Wirel. Mob. Comput..

[8]  Ian F. Akyildiz,et al.  A survey on spectrum management in cognitive radio networks , 2008, IEEE Communications Magazine.

[9]  Boaz Patt-Shamir,et al.  A Note on Distributed Stable Matching , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[10]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[11]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[14]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[15]  Naumaan Nayyar,et al.  Decentralized learning for multi-player multi-armed bandits , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[16]  Kobi Cohen,et al.  Game Theoretic Aspects of the Multi-Channel ALOHA Protocol in Cognitive Radio Networks , 2013, IEEE Journal on Selected Areas in Communications.

[17]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[18]  David Malone,et al.  WLAN channel selection without communication , 2012, Comput. Networks.

[19]  Valentin Polishchuk,et al.  Almost Stable Matchings by Truncating the Gale–Shapley Algorithm , 2009, Algorithmica.

[20]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[21]  D. Bertsekas The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .

[22]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[23]  Rafail Ostrovsky,et al.  A Stable Marriage Requires Communication , 2014, SODA.