Neural Programmer-Interpreters

We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.

[1]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[2]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[3]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[7]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[8]  Walter Schneider,et al.  Controlled & automatic processing: behavior, theory, and biological mechanisms , 2003, Cogn. Sci..

[9]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[10]  Geoffrey E. Hinton,et al.  Using matrices to model symbolic relationship , 2008, NIPS.

[11]  Michael L. Anderson Neural reuse: A fundamental organizational principle of the brain , 2010, Behavioral and Brain Sciences.

[12]  K. Subramanian,et al.  Learning Options through Human Interaction , 2011 .

[13]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[14]  Roberto Prevete,et al.  Programming in the brain: a neural network theoretical framework , 2012, Connect. Sci..

[15]  Dana H. Ballard,et al.  Modular inverse reinforcement learning for visuomotor behavior , 2013, Biological Cybernetics.

[16]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[17]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[18]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Michael D. Howard,et al.  Complementary Learning Systems , 2014, Cogn. Sci..

[21]  Zhi Jin,et al.  Building Program Vector Representations for Deep Learning , 2014, KSEM.

[22]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[23]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[24]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[27]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[28]  Giovanni Pezzulo,et al.  A Programmer-Interpreter Neural Network Architecture for Prefrontal Cognitive Control , 2015, Int. J. Neural Syst..

[29]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[30]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[31]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[32]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[33]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[34]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .