Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

In many learning settings, active/adaptive querying is possible, but the number of rounds of adaptivity is limited. We study the relationship between query complexity and adaptivity in identifying the k most biased coins among a set of n coins with unknown biases. This problem is a common abstraction of many well-studied problems, including the problem of identifying the k best arms in a stochastic multi-armed bandit, and the problem of top-k ranking from pairwise comparisons. An r-round adaptive algorithm for the k most biased coins problem specifies in each round the set of coin tosses to be performed based on the observed outcomes in earlier rounds, and outputs the set of k most biased coins at the end of r rounds. When r = 1, the algorithm is known as non-adaptive; when r is unbounded, the algorithm is known as fully adaptive. While the power of adaptivity in reducing query complexity is well known, full adaptivity requires repeated interaction with the coin tossing (feedback generation) mechanism, and is highly sequential, since the set of coins to be tossed in each round can only be determined after we have observed the outcomes of the coin tosses from the previous round. In contrast, algorithms with only few rounds of adaptivity require fewer rounds of interaction with the feedback generation mechanism, and offer the benefits of parallelism in algorithmic decision-making. Motivated by these considerations, we consider the question of how much adaptivity is needed to realize the optimal worst case query complexity for identifying the k most biased coins. Given any positive integer r, we derive essentially matching upper and lower bounds on the query complexity of r-round algorithms. We then show that Θ(log∗ n) rounds are both necessary and sufficient for achieving the optimal worst case query complexity for identifying the k most biased coins. In particular, our algorithm achieves the optimal query complexity in at most log∗ n rounds, which implies that on any realistic input, 5 parallel rounds of exploration suffice to achieve the optimal worst-case sample complexity. The best known algorithm prior to our work required Θ(log n) rounds to achieve the optimal worst case query complexity even for the special case of k = 1.

[1]  Toyin Ojih Odutola,et al.  Recent Work , 2019, The Massachusetts Review.

[2]  Nihar B. Shah,et al.  Active ranking from pairwise comparisons and when parametric assumptions do not help , 2016, The Annals of Statistics.

[3]  Shie Mannor,et al.  Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[4]  János Komlós,et al.  Deterministic selection in O(loglog N) parallel time , 1986, STOC '86.

[5]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[6]  Nicholas Pippenger,et al.  Sorting and Selecting in Rounds , 1987, SIAM J. Comput..

[7]  Eli Upfal,et al.  Computing with Noisy Information , 1994, SIAM J. Comput..

[8]  David F. Gleich,et al.  Rank aggregation via nuclear norm minimization , 2011, KDD.

[9]  Sanjeev Khanna,et al.  Top-k and Clustering with Noisy Comparisons , 2014, ACM Trans. Database Syst..

[10]  Richard M. Karp,et al.  Noisy binary search and its applications , 2007, SODA '07.

[11]  Noga Alon,et al.  Sorting, Approximate Sorting, and Searching in Rounds , 1988, SIAM J. Discret. Math..

[12]  Vianney Perchet,et al.  Batched Bandit Problems , 2015, COLT.

[13]  Sébastien Bubeck,et al.  Multiple Identifications in Multi-Armed Bandits , 2012, ICML.

[14]  Yang Li,et al.  On Estimating Maximum Matching Size in Graph Streams , 2017, SODA.

[15]  Eyke Hüllermeier,et al.  Top-k Selection based on Adaptive Sampling of Noisy Preferences , 2013, ICML.

[16]  Ambuj Tewari,et al.  PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[17]  Béla Bollobás,et al.  Parallel sorting , 1983, Discret. Appl. Math..

[18]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[19]  Martin J. Wainwright,et al.  Simple, Robust and Optimal Ranking from Pairwise Comparisons , 2015, J. Mach. Learn. Res..

[20]  John N. Tsitsiklis,et al.  The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..

[21]  Matthew Malloy,et al.  Quickest search for a rare distribution , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[22]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[23]  Arun Rajkumar,et al.  A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data , 2014, ICML.

[24]  H. Chernoff Sequential Analysis and Optimal Design , 1987 .

[25]  Devavrat Shah,et al.  Iterative ranking from pair-wise comparisons , 2012, NIPS.

[26]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[27]  Richard M. Karp,et al.  Finding a most biased coin with fewest flips , 2012, COLT.

[28]  Robert D. Nowak,et al.  Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls , 2016, AISTATS.

[29]  Leslie G. Valiant,et al.  Parallelism in Comparison Problems , 1975, SIAM J. Comput..

[30]  Eshcar Hillel,et al.  Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.

[31]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[32]  Matthew Malloy,et al.  On Finding the Largest Mean Among Many , 2013, ArXiv.

[33]  S. Matthew Weinberg,et al.  Parallel algorithms for select and partition with noisy comparisons , 2016, STOC.

[34]  Benjamin Recht,et al.  The Power of Adaptivity in Identifying Statistical Alternatives , 2016, NIPS.

[35]  Yuxin Chen,et al.  Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons , 2015, ICML.

[36]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[37]  Robert D. Nowak,et al.  Active Ranking using Pairwise Comparisons , 2011, NIPS.

[38]  Béla Bollobás,et al.  Parallel Selection with High Probability , 1990, SIAM J. Discret. Math..

[39]  Minje Jang,et al.  Top-K Ranking from Pairwise Comparisons: When Spectral Ranking is Optimal , 2016, ArXiv.

[40]  Oren Somekh,et al.  Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.

[41]  Jian Li,et al.  Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection , 2017, AISTATS.

[42]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[43]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[44]  Jian Li,et al.  On the Optimal Sample Complexity for Best Arm Identification , 2015, ArXiv.