Learning While Optimizing an Unknown Fitness Surface

This paper is about Reinforcement Learning (RL) applied to online parameter tuning in Stochastic Local Search (SLS) methods. In particular a novel application of RL is considered in the Reactive Tabu Search (RTS) method, where the appropriate amount of diversification in prohibition-based (Tabu) local search is adapted in a fast online manner to the characteristics of a task and of the local configuration. We model the parameter-tuning policy as a Markov Decision Process where the states summarize relevant information about the recent history of the search, and we determine a near-optimal policy by using the Least Squares Policy Iteration (LSPI) method. Preliminary experiments on Maximum Satisfiability (MAX-SAT) instances show very promising results indicating that the learnt policy is competitive with previously proposed reactive strategies.

[1]  S.D. Muller,et al.  Step size adaptation in evolution strategies using reinforcement learning , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[2]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[3]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4]  Thomas Stützle,et al.  Tabu Search vs. Random Walk , 1997, KI.

[5]  P. Pardalos,et al.  Handbook of Combinatorial Optimization , 1998 .

[6]  Mauro Brunato,et al.  Reactive Search and Intelligent Optimization , 2008 .

[7]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[8]  Roberto Battiti,et al.  Machine learning methods for parameter tuning in heuristics , 2006 .

[9]  Bart Selman,et al.  Evidence for Invariants in Local Search , 1997, AAAI/IAAI.

[10]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[11]  Roberto Battiti,et al.  Approximate Algorithms and Heuristics for MAX-SAT , 1998 .

[12]  Michail G. Lagoudakis,et al.  Learning to Select Branching Rules in the DPLL Procedure for Satisfiability , 2001, Electron. Notes Discret. Math..

[13]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.

[14]  Holger H. Hoos,et al.  Scaling and Probabilistic Smoothing: Efficient Dynamic Local Search for SAT , 2002, CP.

[15]  Andrew W. Moore,et al.  Learning evaluation functions for global optimization , 1998 .

[16]  Salima Hassas,et al.  Self-Organisation: Paradigms and Applications , 2003, Engineering Self-Organising Systems.

[17]  Roberto Battiti,et al.  Reactive search, a history-sensitive heuristic for MAX-SAT , 1997, JEAL.

[18]  Kee-Eung Kim,et al.  Statistical Machine Learning for Large-Scale Optimization , 2000 .

[19]  Nicolas Barnier,et al.  Solving the Kirkman's schoolgirl problem in a few seconds , 2002 .

[20]  Martijn C. Schut,et al.  Reinforcement Learning for Online Control of Evolutionary Algorithms , 2006, ESOA.

[21]  Roberto Battiti,et al.  The Reactive Tabu Search , 1994, INFORMS J. Comput..

[22]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[23]  Holger H. Hoos,et al.  Novelty + and Adaptive Novelty + , 2004 .

[24]  Dale Schuurmans,et al.  The Exponentiated Subgradient Algorithm for Heuristic Boolean Programming , 2001, IJCAI.

[25]  Hector J. Levesque,et al.  Hard and Easy Distributions of SAT Problems , 1992, AAAI.

[26]  Hector J. Levesque,et al.  A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[27]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.