Selection and Reinforcement Learning for Combinatorial Optimization

Improving on a previous paper, we explicitly relate reinforcement and selection learning (PBIL) algorithms for combinatorial optimization, which is understood as the task of finding a fixed-length binary string maximizing an arbitrary function. We show the equivalence of searching for an optimal string and searching for a probability distribution over strings maximizing the function expectation. In this paper however, we will only consider the family of Bernoulli distributions. Next, we introduce two gradient dynamical systems acting on probability vectors. The first one maximizes the expectation of the function and leads to reinforcement learning algorithms whereas the second one maximizes the logarithm of the expectation of the function and leads to selection learning algorithms. We finally give a stability analysis of solutions.

[1]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[2]  H. Waelbroeck,et al.  Complex Systems and Binary Networks , 1995 .

[3]  Heinz Mühlenbein The Equation for Response to Selection and Its Use for Prediction , 1997, Evolutionary Computation.

[4]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[5]  A. Berny,et al.  Statistical machine learning and combinatorial optimization , 2001 .

[6]  Gilbert Syswerda Simulated Crossover in Genetic Algorithms , 1992, FOGA.

[7]  Geoffrey E. Hinton,et al.  Using EM for Reinforcement Learning , 2000 .

[8]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[9]  A. Berny An adaptive scheme for real function optimization acting as a selection operator , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[10]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and Linear Algebra , 1974 .

[11]  R. Cerf Une théorie asymptotique des algorithmes génétiques , 1994 .

[12]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.