Meta-trained agents implement Bayes-optimal agents

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  John E. Savage,et al.  Models of computation - exploring the power of computing , 1998 .

[3]  George J. Pappas,et al.  Approximate Bisimulations for Nonlinear Dynamical Systems , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[4]  George H. Mealy,et al.  A method for synthesizing sequential circuits , 1955 .

[5]  Yonatan Belinkov,et al.  Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[6]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[9]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[10]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[13]  Daniel A. Braun,et al.  Structure learning in action , 2010, Behavioural Brain Research.

[14]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[15]  Neil C. Rabinowitz Meta-learners' learning dynamics are unlike learners' , 2019, ArXiv.

[16]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[17]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[18]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[19]  Surya Ganguli,et al.  Universality and individuality in neural dynamics across large populations of recurrent networks , 2019, NeurIPS.

[20]  Henry Markram,et al.  On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[21]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[22]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[23]  Surya Ganguli,et al.  Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics , 2019, NeurIPS.

[24]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[25]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[26]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[27]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[28]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[29]  Leslie Pack Kaelbling,et al.  A Situated View of Representation and Control , 1995, Artif. Intell..

[30]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[31]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[32]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[33]  Alan Fern,et al.  Learning Finite State Representations of Recurrent Policy Networks , 2018, ICLR.

[34]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[35]  Jos C. M. Baeten,et al.  Concurrency Theory: A Historical Perspective on Coinduction and Process Calculi , 2014, Computational Logic.

[36]  Abhishek Das,et al.  Grad-CAM: Why did you say that? , 2016, ArXiv.

[37]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[38]  Stefan Zohren,et al.  Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[39]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41]  Hansem Sohn,et al.  Bayesian Computation through Cortical Latent Dynamics , 2018, Neuron.

[42]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[43]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[44]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[45]  Christel Baier,et al.  Principles of model checking , 2008 .

[46]  Yee Whye Teh,et al.  Meta-learning of Sequential Strategies , 2019, ArXiv.

[47]  T. Poggio,et al.  BOOK REVIEW David Marr’s Vision: floreat computational neuroscience VISION: A COMPUTATIONAL INVESTIGATION INTO THE HUMAN REPRESENTATION AND PROCESSING OF VISUAL INFORMATION , 2009 .

[48]  Leslie Pack Kaelbling,et al.  Action and planning in embedded agents , 1990, Robotics Auton. Syst..

[49]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[50]  Nassir Navab,et al.  Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Tsendsuren Munkhdalai,et al.  Metalearned Neural Memory , 2019, NeurIPS.

[52]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[53]  Nick Cammarata,et al.  An Overview of Early Vision in InceptionV1 , 2020 .

[54]  Samy Bengio,et al.  Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[55]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[56]  Samuel J. Gershman,et al.  A theory of learning to infer , 2019, bioRxiv.

[57]  Jieyu Zhao,et al.  Simple Principles of Metalearning , 1996 .

[58]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[59]  Surya Ganguli,et al.  From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction , 2019, NeurIPS.

[60]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[61]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[62]  Niru Maheswaranathan,et al.  How recurrent networks implement contextual processing in sentiment analysis , 2020, ICML.

[63]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[64]  Joelle Pineau,et al.  Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[65]  Daniel A. Braun,et al.  Motor Task Variation Induces Structural Learning , 2009, Current Biology.