An Adaptive Pursuit Strategy for Allocating Operator Probabilities

Learning the optimal probabilities of applying an exploration operator from a set of alternatives can be done by self-adaptation or by adaptive allocation rules. In this paper we consider the latter option. The allocation strategies discussed in the literature basically belong to the class of probability matching algorithms. These strategies adapt the operator probabilities in such a way that they match the reward distribution. In this paper we introduce an alternative adaptive allocation strategy, called the adaptive pursuit method. We compare this method with the probability matching approach in a non-stationary environment. Calculations and experimental results show the superior performance of the adaptive pursuit algorithm. If the reward distributions stay stationary for some time, the adaptive pursuit method converges rapidly and accurately to an operator probability distribution that results in a much higher probability of selecting the current optimal operator and a much higher average reward than with the probability matching strategy. Yet most importantly, the adaptive pursuit scheme remains sensitive to changes in the reward distributions, and reacts swiftly to non-stationary shifts in the environment.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Jim Smith,et al.  Operator and parameter adaptation in genetic algorithms , 1997, Soft Comput..

[3]  Christian Igel,et al.  Operator adaptation in evolutionary computation and its application to structure optimization of neural networks , 2003, Neurocomputing.

[4]  P. S. Sastry,et al.  A Class of Rapidly Converging Algorithms for Learning Automata , 1984 .

[5]  Bryant A. Julstrom,et al.  What Have You Done for Me Lately? Adapting Operator Probabilities in a Steady-State Genetic Algorithm , 1995, ICGA.

[6]  David E. Goldberg,et al.  Probability matching, the magnitude of reinforcement, and classifier system bidding , 2004, Machine Learning.

[7]  Martin J. Oates,et al.  On Fitness Distributions and Expected Fitness Gain of Mutation Rates in Parallel Evolutionary Algorithms , 2002, PPSN.

[8]  Tzung-Pei Hong,et al.  Simultaneously Applying Multiple Mutation Operators in Genetic Algorithms , 2000, J. Heuristics.

[9]  Zbigniew Michalewicz,et al.  Parameter Control in Evolutionary Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[10]  Heinz Mühlenbein,et al.  Strategy Adaption by Competing Subpopulations , 1994, PPSN.

[11]  David E. Goldberg,et al.  Decision making in a hybrid genetic algorithm , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Dirk Schlierkamp Voosen Strategy Adaptation by Competing Subpopulations , 1994 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Peter Ross,et al.  Adapting Operator Settings in Genetic Algorithms , 1998, Evolutionary Computation.

[16]  Lawrence Davis,et al.  Adapting Operator Probabilities in Genetic Algorithms , 1989, ICGA.