Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach

Communication networks shared by many users are a widespread challenge nowadays. In this paper we address several aspects of this challenge simultaneously: learning unknown stochastic network characteristics, sharing resources with other users while keeping coordination overhead to a minimum. The proposed solution combines Multi-Armed Bandit learning with a lightweight signalling-based coordination scheme, and ensures convergence to a stable allocation of resources. Our work considers single-user level algorithms for two scenarios: an unknown fixed number of users, and a dynamic number of users. Analytic performance guarantees, proving convergence to stable marriage configurations, are presented for both setups. The algorithms are designed based on a system-wide perspective, rather than focusing on single user welfare. Thus, maximal resource utilization is ensured. An extensive experimental analysis covers convergence to a stable configuration as well as reward maximization. Experiments are carried out over a wide range of setups, demonstrating the advantages of our approach over existing state-of-the-art methods.

[1]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[2]  Ran Giladi,et al.  Distributed Weighted Stable Marriage Problem , 2010, SIROCCO.

[3]  Claudio Gentile,et al.  Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.

[4]  Baruch Awerbuch,et al.  Competitive collaborative learning , 2005, J. Comput. Syst. Sci..

[5]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[6]  Masoumeh Nasiri-Kenari,et al.  Multiple antenna spectrum sensing in cognitive radios , 2010, IEEE Transactions on Wireless Communications.

[7]  Valentin Polishchuk,et al.  Almost Stable Matchings by Truncating the Gale–Shapley Algorithm , 2009, Algorithmica.

[8]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[9]  Naumaan Nayyar,et al.  On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits , 2015, IEEE Transactions on Control of Network Systems.

[10]  Huseyin Arslan,et al.  Cognitive Radio, Software Defined Radio, and Adaptive Wireless Systems (Signals and Communication Technology) , 2007 .

[11]  Ohad Shamir,et al.  Multi-player bandits: a musical chairs approach , 2016, ICML 2016.

[12]  Lilian Besson,et al.  {Multi-Player Bandits Revisited} , 2017, ALT.

[13]  Amir Leshem,et al.  Multichannel Opportunistic Carrier Sensing for Stable Channel Access Control in Cognitive Radio Systems , 2012, IEEE Journal on Selected Areas in Communications.

[14]  Cheng-Xiang Wang,et al.  Wideband spectrum sensing for cognitive radio networks: a survey , 2013, IEEE Wireless Communications.

[15]  Shie Mannor,et al.  Stochastic bandits with pathwise constraints , 2012 .

[16]  Huseyin Arslan,et al.  Cognitive radio, software defined radio, and adaptiv wireless systems , 2007 .

[17]  Boaz Patt-Shamir,et al.  A Note on Distributed Stable Matching , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[18]  L. S. Shapley,et al.  College Admissions and the Stability of Marriage , 2013, Am. Math. Mon..

[19]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[21]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[22]  Lilian Besson,et al.  Multi-Player Bandits Models Revisited , 2017, ArXiv.

[23]  Wassim Jouini,et al.  Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).

[24]  Sudharman K. Jayaweera Signal Processing for Cognitive Radios: Jayaweera/Signal Processing for Cognitive Radios , 2014 .

[25]  D. Bertsekas The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .

[26]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[27]  Rafail Ostrovsky,et al.  A Stable Marriage Requires Communication , 2014, SODA.

[28]  Peter G. Harrison,et al.  Performance modelling of communication networks and computer architectures , 1992, International computer science series.

[29]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[30]  Matthew R. McKay,et al.  Analysis and Design of Multiple-Antenna Cognitive Radios With Multiple Primary User Signals , 2014, IEEE Transactions on Signal Processing.

[31]  David Malone,et al.  WLAN channel selection without communication , 2012, Comput. Networks.

[32]  Naumaan Nayyar,et al.  Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.

[33]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[34]  Kobi Cohen,et al.  Game Theoretic Aspects of the Multi-Channel ALOHA Protocol in Cognitive Radio Networks , 2013, IEEE Journal on Selected Areas in Communications.

[35]  Shie Mannor,et al.  Concurrent Bandits and Cognitive Radio Networks , 2014, ECML/PKDD.