Concurrent Bandits and Cognitive Radio Networks

We consider the problem of multiple users targeting the arms of a single multi-armed stochastic bandit. The motivation for this problem comes from cognitive radio networks, where selfish users need to coexist without any side communication between them, implicit cooperation or common control. Even the number of users may be unknown and can vary as users join or leave the network. We propose an algorithm that combines an e-greedy learning rule with a collision avoidance mechanism. We analyze its regret with respect to the system-wide optimum and show that sub-linear regret can be obtained in this setting. Experiments show dramatic improvement compared to other algorithms for this setting.

[1]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[2]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[3]  David Malone,et al.  WLAN channel selection without communication , 2012, Comput. Networks.

[4]  Cristina Comaniciu,et al.  Adaptive Channel Allocation Spectrum Etiquette for Cognitive Radio Networks , 2005, First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, 2005. DySPAN 2005..

[5]  E. McKinney Generalized Birthday Problem , 1966 .

[6]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[7]  Sangho Choe Performance Analysis of Slotted ALOHA Based Multi-Channel Cognitive Packet Radio Network , 2009, 2009 6th IEEE Consumer Communications and Networking Conference.

[8]  Jianhua Zhang,et al.  Throughput Analysis for a Multi-User, Multi-Channel ALOHA Cognitive Radio System , 2012, IEEE Transactions on Wireless Communications.

[9]  Shie Mannor,et al.  Stochastic bandits with pathwise constraints , 2012 .

[10]  Qing Zhao,et al.  Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.

[11]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[12]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[13]  Shie Mannor,et al.  Decoupling Exploration and Exploitation in Multi-Armed Bandits , 2012, ICML.

[14]  A. Madansky Bounds on the Expectation of a Convex Function of a Multivariate Random Variable , 1959 .

[15]  Xi Fang,et al.  Taming Wheel of Fortune in the Air: An Algorithmic Framework for Channel Selection Strategy in Cognitive Radio Networks , 2013, IEEE Transactions on Vehicular Technology.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Naumaan Nayyar,et al.  Multi-player multi-armed bandits: Decentralized learning with IID rewards , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[19]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[20]  Dusit Niyato,et al.  Competitive spectrum sharing in cognitive radio networks: a dynamic game approach , 2008, IEEE Transactions on Wireless Communications.

[21]  Wassim Jouini,et al.  Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).

[22]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[23]  Setareh Maghsudi,et al.  Channel Selection for Network-Assisted D2D Communication via No-Regret Bandit Learning With Calibrated Forecasting , 2014, IEEE Transactions on Wireless Communications.