Using Fast Weights to Attend to the Recent Past

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These ``fast weights'' can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proven helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.

[1]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[2]  Teuvo Kohonen,et al.  Correlation Matrix Memories , 1972, IEEE Transactions on Computers.

[3]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[5]  E. Gardner The space of interactions in neural network models , 1988 .

[6]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[7]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[10]  Henry Markram,et al.  Neural Networks with Dynamic Synapses , 1998, Neural Computation.

[11]  G. Bi,et al.  Synaptic Modifications in Cultured Hippocampal Neurons: Dependence on Spike Timing, Synaptic Strength, and Postsynaptic Cell Type , 1998, The Journal of Neuroscience.

[12]  W. Regehr,et al.  Short-term synaptic plasticity. , 2002, Annual review of physiology.

[13]  L. Abbott,et al.  Synaptic computation , 2004, Nature.

[14]  Misha Tsodyks,et al.  Correction: Persistent Activity in Neural Networks with Dynamic Synapses , 2007, PLoS Comput. Biol..

[15]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[16]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[17]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[18]  James A. Anderson and Geoffrey E. Hinton,et al.  Models of Information Processing in the Brain , 2014 .

[19]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[20]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[27]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.