Neural Turing Machines

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

[1]  G. A. Miller The magical number seven plus or minus two: some limits on our capacity for processing information. , 1956, Psychological review.

[2]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[4]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[5]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[6]  David S. Touretzky,et al.  BoltzCONS: Dynamic Symbol Structures in a Connectionist Network , 1990, Artif. Intell..

[7]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[8]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[9]  John von Neumann,et al.  First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.

[10]  P. Goldman-Rakic Cellular basis of working memory , 1995, Neuron.

[11]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[14]  H. Sebastian Seung,et al.  Continuous attractors and oculomotor control , 1998, Neural Networks.

[15]  X. Wang,et al.  Synaptic Basis of Cortical Persistent Activity: the Importance of NMDA Receptors to Working Memory , 1999, The Journal of Neuroscience.

[16]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[17]  G. Marcus The Algebraic Mind: Integrating Connectionism and Cognitive Science , 2001 .

[18]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[19]  G. Miller The cognitive revolution: a historical perspective , 2003, Trends in Cognitive Sciences.

[20]  Tony A. Plate,et al.  Holographic Reduced Representation: Distributed Representation for Cognitive Structures , 2003 .

[21]  P. Barrouillet,et al.  Time constraints and resource sharing in adults' working memory spans. , 2004, Journal of experimental psychology. General.

[22]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[23]  S. Pinker,et al.  The nature of the language faculty and its implications for evolution of language (Reply to Fitch, Hauser, and Chomsky) , 2005, Cognition.

[24]  Noam Chomsky,et al.  The evolution of the language faculty: Clarifications and implications , 2005, Cognition.

[25]  Thomas E. Hazy,et al.  Banishing the homunculus: Making working memory work , 2006, Neuroscience.

[26]  C. Alberini,et al.  Memory , 2006, Cellular and Molecular Life Sciences CMLS.

[27]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[28]  Peter Dayan,et al.  Simple Substrates for Complex Cognition , 2008, Front. Neurosci..

[29]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[30]  Robert F. Hadley The Problem of Rapid Variable Creation , 2009, Neural Computation.

[31]  Charles R. Gallistel,et al.  Memory and the Computational Brain: Why Cognitive Science will Transform Neuroscience , 2009 .

[32]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[33]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[34]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[35]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[37]  Chris Eliasmith,et al.  How to Build a Brain: A Neural Architecture for Biological Cognition , 2013 .

[38]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[39]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[40]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[41]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[42]  Ursula Dresdner,et al.  Computation Finite And Infinite Machines , 2016 .