Policy learning in resource-constrained optimization

We consider an optimization scenario in which resources are required in the evaluation process of candidate solutions. The challenge we are focussing on is that certain resources have to be committed to for some period of time whenever they are used by an optimizer. This has the effect that certain solutions may be temporarily non-evaluable during the optimization. Previous analysis revealed that evolutionary algorithms (EAs) can be effective against this resourcing issue when augmented with static strategies for dealing with non-evaluable solutions, such as repairing, waiting, or penalty methods. Moreover, it is possible to select a suitable strategy for resource-constrained problems offline if the resourcing issue is known in advance. In this paper we demonstrate that an EA that uses a reinforcement learning (RL) agent, here Sarsa(λ), to learn offline when to switch between static strategies, can be more effective than any of the static strategies themselves. We also show that learning the same task as the RL agent but online using an adaptive strategy selection method, here D-MAB, is not as effective; nevertheless, online learning is an alternative to static strategies.

[1]  Joshua D. Knowles,et al.  Closed-loop, multiobjective optimization of analytical instrumentation: gas chromatography/time-of-flight mass spectrometry of the metabolomes of human serum and of yeast fermentations. , 2005, Analytical chemistry.

[2]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[3]  Jacob D. Feala,et al.  Search Algorithms as a Framework for the Optimization of Drug Combinations , 2008, PLoS Comput. Biol..

[4]  Michèle Sebag,et al.  Adaptive operator selection with dynamic multi-armed bandits , 2008, GECCO '08.

[5]  Peter A. N. Bosman,et al.  Learning and anticipation in online dynamic optimization with evolutionary algorithms: the stochastic case , 2007, GECCO '07.

[6]  Martijn C. Schut,et al.  Reinforcement Learning for Online Control of Evolutionary Algorithms , 2006, ESOA.

[7]  Ingo Rechenberg,et al.  Case studies in evolutionary experimentation and computation , 2000 .

[8]  Ofer M. Shir,et al.  The application of evolutionary multi-criteria optimization to dynamic molecular alignment , 2007, 2007 IEEE Congress on Evolutionary Computation.

[9]  J. Knowles,et al.  Ephemeral Resource Constraints in Optimization and Their Effects on Evolutionary Search , 2010 .

[10]  Takeshi Yamada,et al.  Conventional Genetic Algorithm for Job Shop Problems , 1991, ICGA.

[11]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[12]  Jürgen Branke,et al.  Evolutionary Optimization in Dynamic Environments , 2001, Genetic Algorithms and Evolutionary Computation.

[13]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[14]  Zbigniew Michalewicz,et al.  Evolutionary Algorithms for Constrained Parameter Optimization Problems , 1996, Evolutionary Computation.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[17]  Richard M. Everson,et al.  Controlling Genetic Algorithms With Reinforcement Learning , 2002, GECCO.

[18]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Joshua D. Knowles,et al.  On-Line Purchasing Strategies for an Evolutionary Algorithm Performing Resource-Constrained Optimization , 2010, PPSN.