Evolving Deep LSTM-based Memory Networks using an Information Maximization Objective

Reinforcement Learning agents with memory are constructed in this paper by extending neuroevolutionary algorithm NEAT to incorporate LSTM cells, i.e. special memory units with gating logic. Initial evaluation on POMDP tasks indicated that memory solutions obtained by evolving LSTMs outperform traditional RNNs. Scaling neuroevolution of LSTM to deep memory problems is challenging because: (1) the fitness landscape is deceptive, and (2) a large number of associated parameters need to be optimized. To overcome these challenges, a new secondary optimization objective is introduced that maximizes the information (Info-max) stored in the LSTM network. The network training is split into two phases. In the first phase (unsupervised phase), independent memory modules are evolved by optimizing for the info-max objective. In the second phase, the networks are trained by optimizing the task fitness. Results on two different memory tasks indicate that neuroevolution can discover powerful LSTM-based memory solution that outperform traditional RNNs.

[1]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[2]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Risto Miikkulainen,et al.  Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[5]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[6]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[7]  Jürgen Schmidhuber,et al.  Unsupervised Learning in LSTM Recurrent Neural Networks , 2001, ICANN.

[8]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[9]  Risto Miikkulainen,et al.  Evolving adaptive neural networks with and without adaptive synapses , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[10]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11]  Risto Miikkulainen,et al.  Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[12]  K. Holekamp,et al.  Interspecific competition influences reproduction in spotted hyenas , 2008 .

[13]  Finale Doshi-Velez,et al.  The Infinite Partially Observable Markov Decision Process , 2009, NIPS.

[14]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[15]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[16]  Charles E. Hughes,et al.  Evolving plastic neural networks with novelty search , 2010, Adapt. Behav..

[17]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[18]  Risto Miikkulainen,et al.  UT2: Human-like behavior via neuroevolution of combat behavior and replay of human traces , 2011, CIG.

[19]  Christoph Adami,et al.  Annals of the New York Academy of Sciences the Use of Information Theory in Evolutionary Biology , 2022 .

[20]  Stéphane Doncieux,et al.  With a little help from selection pressures: evolution of memory in robot controllers , 2012, ALIFE.

[21]  James A. Reggia,et al.  A generalized LSTM-like training algorithm for second-order recurrent neural networks , 2012, Neural Networks.

[22]  J. Call,et al.  Memory for Distant Past Events in Chimpanzees and Orangutans , 2013, Current Biology.

[23]  Jason N. Bruck Decades-long social memory in bottlenose dolphins , 2013, Proceedings of the Royal Society B: Biological Sciences.

[24]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[25]  Jürgen Schmidhuber,et al.  Evolving deep unsupervised convolutional networks for vision-based reinforcement learning , 2014, GECCO.

[26]  Joel Lehman,et al.  Overcoming deception in evolution of cognitive behaviors , 2014, GECCO.

[27]  Kenneth O. Stanley,et al.  Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation , 2014, AAAI.

[28]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.