GATED FAST WEIGHTS FOR ASSOCIATIVE RETRIEVAL

We improve previous end-to-end differentiable neural networks (NNs) with fast weight memories. A gate mechanism updates fast weights at every time step of a sequence through two separate outer-product-based matrices generated by slow parts of the net. The system is trained on a complex sequence to sequence variation of the Associative Retrieval Problem with roughly 70 times more temporal memory (i.e. time-varying variables) than similar-sized standard recurrent NNs (RNNs). In terms of accuracy and number of parameters, our architecture outperforms a variety of RNNs, including Long Short-Term Memory, Hypernetworks, and related fast weight architectures.

[1]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[2]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[3]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[4]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[5]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[6]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[7]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[8]  Jerome A. Feldman,et al.  Dynamic connections in neural networks , 1990, Biological Cybernetics.

[9]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[10]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[11]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[12]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[15]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[16]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .