Algorithm portfolio selection as a bandit problem with unbounded losses

We propose a method that learns to allocate computation time to a given set of algorithms, of unknown performance, with the aim of solving a given sequence of problem instances in a minimum time. Analogous meta-learning techniques are typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. We adopt instead an online approach, named GAMBLETA, in which algorithm performance models are iteratively updated, and used to guide allocation on a sequence of problem instances. GAMBLETA is a general method for selecting among two or more alternative algorithm portfolios. Each portfolio has its own way of allocating computation time to the available algorithms, possibly based on performance models, in which case its performance is expected to improve over time, as more runtime data becomes available. The resulting exploration-exploitation trade-off is represented as a bandit problem. In our previous work, the algorithms corresponded to the arms of the bandit, and allocations evaluated by the different portfolios were mixed, using a solver for the bandit problem with expert advice, but this required the setting of an arbitrary bound on algorithm runtimes, invalidating the optimal regret of the solver. In this paper, we propose a simpler version of GAMBLETA, in which the allocators correspond to the arms, such that a single portfolio is selected for each instance. The selection is represented as a bandit problem with partial information, and an unknown bound on losses. We devise a solver for this game, proving a bound on its expected regret. We present experiments based on results from several solver competitions, in various domains, comparing GAMBLETA with another online method.

[1]  Jürgen Schmidhuber,et al.  Learning Restart Strategies , 2007, IJCAI.

[2]  Matteo Gagliolo Online Dynamic Algorithm Portfolios , 2010 .

[3]  David Zuckerman,et al.  Optimal speedup of Las Vegas algorithms , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[6]  Stephen F. Smith,et al.  Combining Multiple Heuristics Online , 2007, AAAI.

[7]  Yoav Shoham,et al.  Understanding Random SAT: Beyond the Clauses-to-Variables Ratio , 2004, CP.

[8]  Marek Petrik,et al.  Learning parallel portfolios of algorithms , 2006, Annals of Mathematics and Artificial Intelligence.

[9]  L. Babai Monte-Carlo algorithms in graph isomorphism testing , 2006 .

[10]  Bart Selman,et al.  Algorithm portfolios , 2001, Artif. Intell..

[11]  Jürgen Schmidhuber,et al.  Adaptive Online Time Allocation to Search Algorithms , 2004, ECML.

[12]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[13]  Matteo Gagliolo Universal search , 2007, Scholarpedia.

[14]  Juan M. Corchado,et al.  International Symposium on Distributed Computing and Artificial Intelligence, DCAI 2011, Salamanca, Spain, 6-8 April 2011 , 2011, DCAI.

[15]  G. Molenberghs,et al.  Longitudinal data analysis , 2008 .

[16]  Gábor Lugosi,et al.  Minimizing Regret with Label Efficient Prediction , 2004, COLT.

[17]  Jürgen Schmidhuber,et al.  Learning dynamic algorithm portfolios , 2006, Annals of Mathematics and Artificial Intelligence.

[18]  Thomas Bartz-Beielstein,et al.  Experimental Methods for the Analysis of Optimization Algorithms , 2010 .

[19]  Jürgen Schmidhuber,et al.  Towards Distributed Algorithm Portfolios , 2008, DCAI.

[20]  Ehud Rivlin,et al.  Optimal Schedules for Parallelizing Anytime Algorithms: The Case of Shared Resources , 2003, J. Artif. Intell. Res..

[21]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[22]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[23]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[24]  Eric Horvitz,et al.  Computational tradeoffs under bounded resources , 2001, Artif. Intell..

[25]  Thomas Stützle,et al.  Stochastic Local Search: Foundations & Applications , 2004 .

[26]  M. Muselli,et al.  Parallel trials versus single search in supervised learning , 1991 .

[27]  Yoav Shoham,et al.  Learning the Empirical Hardness of Optimization Problems: The Case of Combinatorial Auctions , 2002, CP.

[28]  J. Klein,et al.  Survival Analysis: Techniques for Censored and Truncated Data , 1997 .

[29]  Catherine Legrand,et al.  Algorithm Survival Analysis , 2010, Experimental Methods for the Analysis of Optimization Algorithms.

[30]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[31]  Tad Hogg,et al.  An Economics Approach to Hard Computational Problems , 1997, Science.

[32]  Marek Petrik Statistically Optimal Combination of Algorithms , 2004 .

[33]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[34]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[35]  J MatteoGagliolo Algorithm Selection as a Bandit Problem with Unbounded Losses , 2008 .

[36]  Mauro Brunato,et al.  Reactive Search and Intelligent Optimization , 2008 .

[37]  Stephen F. Smith,et al.  New Techniques for Algorithm Portfolio Design , 2008, UAI.

[38]  Peter Auer,et al.  Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.

[39]  Stephen F. Smith,et al.  Using online algorithms to solve np-hard problems more efficiently in practice , 2007 .

[40]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[41]  Ehud Rivlin,et al.  Optimal schedules for parallelizing anytime algorithms: the case of independent processes , 2002, AAAI/IAAI.

[42]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[43]  Jürgen Schmidhuber,et al.  A Neural Network Model for Inter-problem Adaptive Online Time Allocation , 2005, ICANN.

[44]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[45]  Kevin Leyton-Brown,et al.  : The Design and Analysis of an Algorithm Portfolio for SAT , 2007, CP.

[46]  Bart Selman,et al.  Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems , 2000, Journal of Automated Reasoning.

[47]  Frank Hutter,et al.  Parameter Adjustment Based on Performance Prediction: Towards an Instance-Aware Problem Solver , 2005 .

[48]  Yishay Mansour,et al.  Combining Multiple Heuristics , 2006, STACS.

[49]  J. F. Kolen Faster learning through a probabilistic approximation algorithm , 1988, IEEE 1988 International Conference on Neural Networks.

[50]  Mauro Birattari,et al.  Mixed-Effects Modeling of Optimisation Algorithm Performance , 2009, SLS.