Improved Learning Complexity in Combinatorial Pure Exploration Bandits

We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem. We first construct a new measure of complexity that provably characterizes the learning performance of the algorithms we propose for the fixed confidence and the fixed budget setting. We show that this complexity is never higher than the one in existing work and illustrate a number of configurations in which it can be significantly smaller. While in general this improvement comes at the cost of increased computational complexity, we provide a series of examples , including a planning problem, where this extra cost is not significant.

[1]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[4]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[5]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[6]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[7]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[8]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[9]  Varun Grover,et al.  Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..

[10]  Warren B. Powell,et al.  Information Collection on a Graph , 2011, Oper. Res..

[11]  Alessandro Lazaric,et al.  Multi-Bandit Best Arm Identification , 2011, NIPS.

[12]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[13]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[14]  Tengyao Wang,et al.  Multiple Identications in Multi-Armed Bandits , 2013 .

[15]  Shivaram Kalyanakrishnan,et al.  Information Complexity in Bandit Subset Selection , 2013, COLT.

[16]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[17]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[18]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[19]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[20]  Alessandro Lazaric,et al.  Best-Arm Identification in Linear Bandits , 2014, NIPS.

[21]  Yifan Wu,et al.  On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments , 2015, ICML.

[22]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[23]  Djallel Bouneffouf,et al.  Finite-time analysis of the multi-armed bandit problem with known trend , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[24]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.