Learning Associative Inference Using Fast Weight Memory

Humans can quickly associate stimuli to solve problems in novel contexts. Our novel neural network model learns state representations of facts that can be composed to perform such associative inference. To this end, we augment the LSTM model with an associative memory, dubbed Fast Weight Memory (FWM). Through differentiable operations at every step of a given input sequence, the LSTM updates and maintains compositional associations stored in the rapidly changing FWM weights. Our model is trained end-to-end by gradient descent and yields excellent performance on compositional language reasoning problems, meta-reinforcement-learning for POMDPs, and small-scale word-level language modelling.

[1]  Wei Zhang,et al.  Learning to update Auto-associative Memory in Recurrent Neural Networks for Improving Sequence Memorization , 2017, ArXiv.

[2]  Kenneth O. Stanley,et al.  Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity , 2018, ICLR.

[3]  Alison R Preston,et al.  Ventromedial Prefrontal Cortex Is Necessary for Normal Associative Inference and Memory Integration , 2018, The Journal of Neuroscience.

[4]  Zenon W. Pylyshyn,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[5]  P. Smolensky Symbolic functions from neural computation , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[6]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[7]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[8]  Jonathan Berant,et al.  Learning to generalize to new compositions in image understanding , 2016, ArXiv.

[9]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[10]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[11]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[12]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[13]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[14]  Volker Tresp,et al.  The Tensor Memory Hypothesis , 2017, ArXiv.

[15]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[16]  Margaret L. Schlichting,et al.  Memory integration: neural mechanisms and implications for behavior , 2015, Current Opinion in Behavioral Sciences.

[17]  Steve Renals,et al.  Dynamic Evaluation of Neural Sequence Models , 2017, ICML.

[18]  Jürgen Schmidhuber,et al.  Learning to Reason with Third-Order Tensor Products , 2018, NeurIPS.

[19]  Marco Baroni,et al.  Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks , 2017, ICLR 2018.

[20]  Tsendsuren Munkhdalai,et al.  Metalearned Neural Memory , 2019, NeurIPS.

[21]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[22]  Daniel D. Johnson,et al.  Learning Graphical State Transitions , 2016, ICLR.

[23]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[24]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[25]  Kenneth O. Stanley,et al.  Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.

[26]  Jianfeng Gao,et al.  Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving , 2019, ArXiv.

[27]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[28]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[29]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[30]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[31]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[32]  John J. Hopfield,et al.  Dense Associative Memory for Pattern Recognition , 2016, NIPS.

[33]  Matthias Löwe,et al.  On a Model of Associative Memory with Huge Storage Capacity , 2017, 1702.01929.

[34]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[35]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[36]  Dhruv Batra,et al.  C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset , 2017, ArXiv.

[37]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[38]  BART KOSKO,et al.  Bidirectional associative memories , 1988, IEEE Trans. Syst. Man Cybern..

[39]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[41]  Robert F. Hadley Systematicity in Connectionist Language Learning , 1994 .

[42]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[43]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[44]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[45]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[46]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[47]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[48]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[49]  Y. Niv,et al.  Discovering latent causes in reinforcement learning , 2015, Current Opinion in Behavioral Sciences.

[50]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[51]  Louis Kirsch,et al.  Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.

[52]  Simon Osindero,et al.  Meta-Learning Deep Energy-Based Memory Models , 2020, ICLR.

[53]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[54]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[55]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[56]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[57]  J. Kruskal Rank, decomposition, and uniqueness for 3-way and n -way arrays , 1989 .

[58]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[59]  Jürgen Schmidhuber,et al.  Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control , 2019, ICLR.

[60]  Jürgen Schmidhuber,et al.  Gated Fast Weights for On-The-Fly Neural Program Generation , 2017 .

[61]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[62]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[63]  Geir Kjetil Sandve,et al.  Hopfield Networks is All You Need , 2020, ArXiv.

[64]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[65]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[66]  Jerome A. Feldman,et al.  Dynamic connections in neural networks , 1990, Biological Cybernetics.

[67]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[68]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[69]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[70]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[71]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[72]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[73]  Steven Phillips,et al.  Connectionism and the problem of systematicity , 1995 .

[74]  Truyen Tran,et al.  Self-Attentive Associative Memory , 2020, ICML.