Bandits attack function optimization

We consider function optimization as a sequential decision making problem under the budget constraint. Such constraint limits the number of objective function evaluations allowed during the optimization. We consider an algorithm inspired by a continuous version of a multi-armed bandit problem which attacks this optimization problem by solving the tradeoff between exploration (initial quasi-uniform search of the domain) and exploitation (local optimization around the potentially global maxima). We introduce the so-called Simultaneous Optimistic Optimization (SOO), a deterministic algorithm that works by domain partitioning. The benefit of such an approach are the guarantees on the returned solution and the numerical efficiency of the algorithm. We present this machine learning rooted approach to optimization, and provide the empirical assessment of SOO on the CEC'2014 competition on single objective real-parameter numerical optimization testsuite.

[1]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[2]  Ponnuthurai Nagaratnam Suganthan,et al.  Problem Definitions and Evaluation Criteria for the CEC 2014 Special Session and Competition on Single Objective Real-Parameter Numerical Optimization , 2014 .

[3]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[4]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[5]  T. Rowan Functional stability analysis of numerical algorithms , 1990 .

[6]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[7]  András György,et al.  Efficient Multi-start Strategies for Local Search Algorithms , 2009, ECML/PKDD.

[8]  András György,et al.  Efficient Multi-Start Strategies for Local Search Algorithms , 2009, J. Artif. Intell. Res..

[9]  M. Powell The BOBYQA algorithm for bound constrained optimization without derivatives , 2009 .

[10]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[13]  R. Munos,et al.  Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[14]  Rémi Munos,et al.  Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.

[15]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .