PAC Best Arm Identification Under a Deadline

We study ( , δ)-PAC best arm identification, where a decision-maker must identify an -optimal arm with probability at least 1− δ, while minimizing the number of arm pulls (samples). Most of the work on this topic is in the sequential setting, where there is no constraint on the time taken to identify such an arm; this allows the decision-maker to pull one arm at a time. In this work, the decision-maker is given a deadline of T rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them; this distinguishes the number of decisions made (i.e., time or number of rounds) from the number of samples acquired (cost). Such situations occur in clinical trials, where one may need to identify a promising treatment under a deadline while minimizing the number of test subjects, or in simulation-based studies run on the cloud, where we can elastically scale up or down the number of virtual machines to conduct as many experiments as we wish, but need to pay for the resource-time used. As the decisionmaker can only make T decisions, she may need to pull some arms excessively relative to a sequential algorithm in order to perform well on all possible problems. We formalize this added difficulty with two hardness results that indicate that unlike sequential settings, the ability to adapt to the problem difficulty is constrained by the finite deadline. We propose Elastic Batch Racing (EBR), a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both hardness results. We present simulations evaluating EBR in this setting, where it outperforms baselines by several orders of magnitude.

[1]  R. Nichol,et al.  Cosmological constraints from the SDSS luminous red galaxies , 2006, astro-ph/0608632.

[2]  R. H. Farrell Asymptotic Behavior of Expected Sample Size in Certain One Sided Tests , 1964 .

[3]  Zi Wang,et al.  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning , 2017, ICML.

[4]  Arpit Agarwal,et al.  Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons , 2017, COLT.

[5]  Michael I. Jordan,et al.  Resource Allocation in Multi-armed Bandit Exploration: Overcoming Nonlinear Scaling with Adaptive Parallelism , 2021, ICML.

[6]  H. Chernoff Sequential Analysis and Optimal Design , 1987 .

[7]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[8]  J. Broach,et al.  High-throughput screening for drug discovery. , 1996, Nature.

[9]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[10]  Enhong Chen,et al.  Efficient Pure Exploration in Adaptive Round model , 2019, NeurIPS.

[11]  Ion Stoica,et al.  HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline , 2019, SoCC.

[12]  Kirthevasan Kandasamy,et al.  Autonomous discovery of battery electrolytes with robotic experimentation and machine-learning , 2019, Cell Reports Physical Science.

[13]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[14]  Jian Li,et al.  On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.

[15]  Aurélien Garivier,et al.  Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[16]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[17]  Robert D. Nowak,et al.  Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls , 2016, AISTATS.

[18]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[19]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[20]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[21]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[22]  Matthew Malloy,et al.  lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[23]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[24]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[25]  T. Lai,et al.  SELF-NORMALIZED PROCESSES: EXPONENTIAL INEQUALITIES, MOMENT BOUNDS AND ITERATED LOGARITHM LAWS , 2004, math/0410102.

[26]  Kirthevasan Kandasamy,et al.  RubberBand: cloud-based hyperparameter tuning , 2021, EuroSys.

[27]  Daniel Russo,et al.  Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.

[28]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[29]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[30]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[31]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[32]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.