Gambling in a Computationally Expensive Casino : Algorithm Selection as a Bandit Problem

Automating algorithm selection and parameter tuning is an o ld dream of the AI community, which has been brought closer to reality in the la st decade. Most available techniques are either oblivious, with no knowledge transfer across different problems; or are based on a model of algorithm perform ance, learned in a separateoffline training sequence, which is often prohibitively expensive . We describe recent work in which the problem is treated in a full y onlinesetting. A model of algorithm performance can be learned and used to reduce the cost of learning it. The resultingexploration-exploitationtrade-off can be treated in the context of Bandit problems.

[1]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[2]  Wayne Nelson,et al.  Applied life data analysis , 1983 .

[3]  David Zuckerman,et al.  Optimal speedup of Las Vegas algorithms , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[4]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5]  Chu Min Li,et al.  Heuristics Based on Unit Propagation for Satisfiability Problems , 1997, IJCAI.

[6]  F. Post,et al.  An Economics Approach to Hard Computational Problems , 1997 .

[7]  Tad Hogg,et al.  An Economics Approach to Hard Computational Problems , 1997, Science.

[8]  Toby Walsh,et al.  Morphing: Combining Structure and Randomness , 1999, AAAI/IAAI.

[9]  Thomas Stützle,et al.  SATLIB: An Online Resource for Research on SAT , 2000 .

[10]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.

[11]  Michail G. Lagoudakis,et al.  Reinforcement Learning for Algorithm Selection , 2000, AAAI/IAAI.

[12]  Bart Selman,et al.  Algorithm portfolios , 2001, Artif. Intell..

[13]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[14]  Yoav Shoham,et al.  Learning the Empirical Hardness of Optimization Problems: The Case of Combinatorial Auctions , 2002, CP.

[15]  Eric Horvitz,et al.  Dynamic restart policies , 2002, AAAI/IAAI.

[16]  Marek Petrik Statistically Optimal Combination of Algorithms , 2004 .

[17]  Jürgen Schmidhuber,et al.  Adaptive Online Time Allocation to Search Algorithms , 2004, ECML.

[18]  Bart Selman,et al.  Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems , 2000, Journal of Automated Reasoning.

[19]  Thomas Stützle,et al.  Local Search Algorithms for SAT: An Empirical Evaluation , 2000, Journal of Automated Reasoning.

[20]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[21]  J. Christopher Beck,et al.  Simple Rules for Low-Knowledge Algorithm Selection , 2004, CPAIOR.

[22]  Yoav Shoham,et al.  Understanding Random SAT: Beyond the Clauses-to-Variables Ratio , 2004, CP.

[23]  Stephen F. Smith,et al.  The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[24]  Chu Min Li,et al.  Diversification and Determinism in Local Search for Satisfiability , 2005, SAT.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[27]  Frank Hutter,et al.  Parameter Adjustment Based on Performance Prediction: Towards an Instance-Aware Problem Solver , 2005 .

[28]  Jürgen Schmidhuber,et al.  A Neural Network Model for Inter-problem Adaptive Online Time Allocation , 2005, ICANN.

[29]  J. Christopher Beck,et al.  APPLYING MACHINE LEARNING TO LOW‐KNOWLEDGE CONTROL OF OPTIMIZATION ALGORITHMS , 2005, Comput. Intell..

[30]  Jürgen Schmidhuber,et al.  Learning dynamic algorithm portfolios , 2006, Annals of Mathematics and Artificial Intelligence.

[31]  Stephen F. Smith,et al.  An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[32]  Jürgen Schmidhuber,et al.  Impact of Censored Sampling on the Performance of Restart Strategies , 2006, CP.

[33]  Marek Petrik,et al.  Learning Static Parallel Portfolios of Algorithms , 2006, ISAIM.

[34]  Jürgen Schmidhuber,et al.  Dynamic Algorithm Portfolios , 2006, AI&M.

[35]  Jürgen Schmidhuber,et al.  Learning Restart Strategies , 2007, IJCAI.

[36]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[37]  Marvin Rausand,et al.  Life Data Analysis , 2008 .