Co-evolving recurrent neurons learn deep memory POMDPs

Recurrent neural networks are theoretically capable of learning complex temporal sequences, but training them through gradient-descent is too slow and unstable for practical use in reinforcement learning environments. Neuroevolution, the evolution of artificial neural networks using genetic algorithms, can potentially solve real-world reinforcement learning tasks that require deep use of memory, i.e. memory spanning hundreds or thousands of inputs, by searching the space of recurrent neural networks directly. In this paper, we introduce a new neuroevolution algorithm called Hierarchical Enforced SubPopulations that simultaneously evolves networks at two levels of granularity: full networks and network components or neurons. We demonstrate the method in two POMDP tasks that involve temporal dependencies of up to thousands of time-steps, and show that it is faster and simpler than the current best conventional reinforcement learning system on these tasks.

[1]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[2]  Jeffrey L. Elman,et al.  Learning and Evolution in Neural Networks , 1994, Adapt. Behav..

[3]  Randall D. Beer,et al.  Evolving Dynamical Neural Networks for Adaptive Behavior , 1992, Adapt. Behav..

[4]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[5]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[6]  X. Yao Evolving Artificial Neural Networks , 1999 .

[7]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[8]  Risto Miikkulainen,et al.  Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .

[9]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[10]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[13]  Risto Miikkulainen,et al.  Robust non-linear control through neuroevolution , 2003 .

[14]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.