论文信息 - Memory-Efficient Backpropagation Through Time

Memory-Efficient Backpropagation Through Time

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.

[1] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.

[2] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[3] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[4] Mikhail Pavlov,et al. Deep Attention Recurrent Q-Network , 2015, ArXiv.

[5] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[6] Laurent Hascoët,et al. The Data-Flow Equations of Checkpointing in Reverse Automatic Differentiation , 2006, International Conference on Computational Science.

[7] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[9] J. Schmidhuber,et al. A First Look at Music Composition using LSTM Recurrent Neural Networks , 2002 .

[10] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[12] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[13] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[14] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[15] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.