Relational recurrent neural networks

Memory-based neural networks model temporal data by leveraging an ability to remember information for long periods. It is unclear, however, whether they also have an ability to perform complex relational reasoning with the information they remember. Here, we first confirm our intuitions that standard memory architectures may struggle at tasks that heavily involve an understanding of the ways in which entities are connected -- i.e., tasks involving relational reasoning. We then improve upon these deficits by using a new memory module -- a \textit{Relational Memory Core} (RMC) -- which employs multi-head dot product attention to allow memories to interact. Finally, we test the RMC on a suite of tasks that may profit from more capable relational reasoning across sequential information, and show large gains in RL domains (e.g. Mini PacMan), program evaluation, and language modeling, achieving state-of-the-art results on the WikiText-103, Project Gutenberg, and GigaWord datasets.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Sophie Papst,et al.  Memory Systems 1994 , 1994 .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  K. Holyoak,et al.  A System for Relational Reasoning in Human Prefrontal Cortex , 1999 .

[5]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[6]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[7]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[8]  K. Holyoak,et al.  A neurocomputational system for relational reasoning , 2012, Trends in Cognitive Sciences.

[9]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[12]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[13]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[14]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[21]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[22]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[23]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[24]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[26]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[27]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[28]  Razvan Pascanu,et al.  Discovering objects and their relations from entangled scene representations , 2017, ICLR.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[31]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[32]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[33]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[34]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[35]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[36]  R. Socher,et al.  Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours , 2018 .

[37]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[38]  Vladlen Koltun,et al.  Convolutional Sequence Modeling Revisited , 2018, ICLR.

[39]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[40]  Peter Dayan,et al.  Fast Parametric Learning with Activation Memorization , 2018, ICML.

[41]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Héctor Allende,et al.  Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module , 2018, ACL.

[43]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[44]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Thomas S. Huang,et al.  Non-Local Recurrent Network for Image Restoration , 2018, NeurIPS.

[46]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.