Intrinsically motivated neuroevolution for vision-based reinforcement learning

Neuroevolution, the artificial evolution of neural networks, has shown great promise on continuous reinforcement learning tasks that require memory. However, it is not yet directly applicable to realistic embedded agents using high-dimensional (e.g. raw video images) inputs, requiring very large networks. In this paper, neuroevolution is combined with an unsupervised sensory pre-processor or compressor that is trained on images generated from the environment by the population of evolving recurrent neural network controllers. The compressor not only reduces the input cardinality of the controllers, but also biases the search toward novel controllers by rewarding those controllers that discover images that it reconstructs poorly. The method is successfully demonstrated on a vision-based version of the well-known mountain car benchmark, where controllers receive only single high-dimensional visual images of the environment, from a third-person perspective, instead of the standard two-dimensional state vector which includes information about velocity.

[1]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[2]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[3]  Andrew W. Moore,et al.  Knowledge of knowledge and intelligent experimentation for learning control , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[4]  Xin Yao,et al.  A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[5]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[6]  Benjamin Kuipers,et al.  Map Learning with Uninterpreted Sensors and Effectors , 1995, Artif. Intell..

[7]  T. Heskes Energy functions for self-organizing maps , 1999 .

[8]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[9]  Risto Miikkulainen,et al.  Robust non-linear control through neuroevolution , 2003 .

[10]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[11]  Risto Miikkulainen,et al.  Efficient evolution of neural networks through complexification , 2004 .

[12]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[13]  Kenneth O. Stanley,et al.  Generating large-scale neural networks through discovering geometric regularities , 2007, GECCO '07.

[14]  Justus H. Piater,et al.  Closed-Loop Learning of Visual Control Policies , 2011, J. Artif. Intell. Res..

[15]  Fernando Fernández,et al.  Two steps reinforcement learning , 2008, Int. J. Intell. Syst..

[16]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[17]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[18]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[19]  Tom Schaul,et al.  Stochastic search using the natural gradient , 2009, ICML '09.

[20]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[21]  Jürgen Schmidhuber,et al.  Evolving neural networks in compressed weight space , 2010, GECCO '10.

[22]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[23]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[24]  Jürgen Schmidhuber,et al.  Formal Theory of Fun and Creativity , 2010, ECML/PKDD.

[25]  Tom Schaul,et al.  High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.

[26]  Jürgen Schmidhuber,et al.  Sequential Constant Size Compressors for Reinforcement Learning , 2011, AGI.