Recurrent Orthogonal Networks and Long-Memory Tasks

Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illuminate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions furthermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[5]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[6]  Haim Sompolinsky,et al.  Short-term memory in orthogonal neural networks. , 2004, Physical review letters.

[7]  Surya Ganguli,et al.  Memory traces in dynamical systems , 2008, Proceedings of the National Academy of Sciences.

[8]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[9]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[10]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[12]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[13]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[16]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[17]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[18]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[19]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.