Neuroevolution for reinforcement learning using evolution strategies

We apply the CMA-ES, an evolution strategy which efficiently adapts the covariance matrix of the mutation distribution, to the optimization of the weights of neural networks for solving reinforcement learning problems. It turns out that the topology of the networks considerably influences the time to find a suitable control strategy. Still, our results with fixed network topologies are significantly better than those reported for the best evolutionary method so far, which adapts both the weights and the structure of the networks.

[1]  Hans-Paul Schwefel,et al.  Evolution strategies – A comprehensive introduction , 2002, Natural Computing.

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[3]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[5]  L. D. Whitley,et al.  Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[6]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[7]  David B. Fogel,et al.  Evolving Neural Control Systems , 1995, IEEE Expert.

[8]  Martin Mandischer A comparison of evolution strategies and backpropagation for neural network training , 2002, Neurocomputing.

[9]  H. He,et al.  Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..

[10]  Alexis P. Wieland,et al.  Evolving Controls for Unstable Systems , 1991 .

[11]  Christian Igel,et al.  Optimization of dynamic neural fields , 2001, Neurocomputing.

[12]  David B. Fogel,et al.  Evolution, neural networks, games, and intelligence , 1999, Proc. IEEE.

[13]  N. Hansen,et al.  Convergence Properties of Evolution Strategies with the Derandomized Covariance Matrix Adaptation: T , 1997 .

[14]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[15]  L. Darrell Whitley,et al.  Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[16]  Diego Calvanese,et al.  Unifying Class-Based Representation Formalisms , 1999, J. Artif. Intell. Res..

[17]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[18]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[19]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning through Symbiotic Evolution , 1996, Machine Learning.