论文信息 - Learning While Optimizing an Unknown Fitness Surface

Learning While Optimizing an Unknown Fitness Surface

This paper is about Reinforcement Learning (RL) applied to online parameter tuning in Stochastic Local Search (SLS) methods. In particular a novel application of RL is considered in the Reactive Tabu Search (RTS) method, where the appropriate amount of diversification in prohibition-based (Tabu) local search is adapted in a fast online manner to the characteristics of a task and of the local configuration. We model the parameter-tuning policy as a Markov Decision Process where the states summarize relevant information about the recent history of the search, and we determine a near-optimal policy by using the Least Squares Policy Iteration (LSPI) method. Preliminary experiments on Maximum Satisfiability (MAX-SAT) instances show very promising results indicating that the learnt policy is competitive with previously proposed reactive strategies.

[1] S.D. Muller,et al. Step size adaptation in evolution strategies using reinforcement learning , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[2] Bart Selman,et al. Noise Strategies for Improving Local Search , 1994, AAAI.

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] Thomas Stützle,et al. Tabu Search vs. Random Walk , 1997, KI.

[5] P. Pardalos,et al. Handbook of Combinatorial Optimization , 1998 .

[6] Mauro Brunato,et al. Reactive Search and Intelligent Optimization , 2008 .

[7] T. Kohonen,et al. Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[8] Roberto Battiti,et al. Machine learning methods for parameter tuning in heuristics , 2006 .

[9] Bart Selman,et al. Evidence for Invariants in Local Search , 1997, AAAI/IAAI.

[10] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[11] Roberto Battiti,et al. Approximate Algorithms and Heuristics for MAX-SAT , 1998 .

[12] Michail G. Lagoudakis,et al. Learning to Select Branching Rules in the DPLL Procedure for Satisfiability , 2001, Electron. Notes Discret. Math..

[13] Michail G. Lagoudakis,et al. Algorithm Selection using Reinforcement Learning , 2000, ICML.

[14] Holger H. Hoos,et al. Scaling and Probabilistic Smoothing: Efficient Dynamic Local Search for SAT , 2002, CP.

[15] Andrew W. Moore,et al. Learning evaluation functions for global optimization , 1998 .

[16] Salima Hassas,et al. Self-Organisation: Paradigms and Applications , 2003, Engineering Self-Organising Systems.

[17] Roberto Battiti,et al. Reactive search, a history-sensitive heuristic for MAX-SAT , 1997, JEAL.

[18] Kee-Eung Kim,et al. Statistical Machine Learning for Large-Scale Optimization , 2000 .

[19] Nicolas Barnier,et al. Solving the Kirkman's schoolgirl problem in a few seconds , 2002 .

[20] Martijn C. Schut,et al. Reinforcement Learning for Online Control of Evolutionary Algorithms , 2006, ESOA.

[21] Roberto Battiti,et al. The Reactive Tabu Search , 1994, INFORMS J. Comput..

[22] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[23] Holger H. Hoos,et al. Novelty + and Adaptive Novelty + , 2004 .

[24] Dale Schuurmans,et al. The Exponentiated Subgradient Algorithm for Heuristic Boolean Programming , 2001, IJCAI.

[25] Hector J. Levesque,et al. Hard and Easy Distributions of SAT Problems , 1992, AAAI.

[26] Hector J. Levesque,et al. A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[27] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.