Gated Fast Weights for On-The-Fly Neural Program Generation

We improve previous end-to-end differentiable neural networks (NNs) with fast weight memories. A gate mechanism updates fast weights at every time step of a sequence through two separate outer-product-based matrices generated by slow parts of the architecture. The system is trained on a complex sequence to sequence variation of the Associative Retrieval Problem with roughly 50 times more temporal memory (i.e. time-varying variables) than similar-sized standard recurrent NNs (RNNs). In terms of accuracy and number of parameters, our architecture outperforms a variety of RNNs, including Long Short-Term Memory, Hypernetworks, and related fast weight architectures. We relate this to metalearning through an experiment which shows how the slow weights can learn an on-line learning program which can generate a smaller program able to answer a set of queries.

[1]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[2]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[3]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[4]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[5]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[6]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[7]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[9]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[10]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[11]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[12]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[13]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[14]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[15]  Jürgen Schmidhuber,et al.  Evolving Modular Fast-Weight Networks for Control , 2005, ICANN.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  J. Schmidhuber,et al.  A neural network that embeds its own meta-levels , 1993, IEEE International Conference on Neural Networks.

[18]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[19]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[20]  Jerome A. Feldman,et al.  Dynamic connections in neural networks , 1990, Biological Cybernetics.

[21]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[22]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.