论文信息 - Meta-trained agents implement Bayes-optimal agents

Meta-trained agents implement Bayes-optimal agents

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2] John E. Savage,et al. Models of computation - exploring the power of computing , 1998 .

[3] George J. Pappas,et al. Approximate Bisimulations for Nonlinear Dynamical Systems , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[4] George H. Mealy,et al. A method for synthesizing sequential circuits , 1955 .

[5] Yonatan Belinkov,et al. Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[6] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Arvind Satyanarayan,et al. The Building Blocks of Interpretability , 2018 .

[9] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[10] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[11] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[12] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[13] Daniel A. Braun,et al. Structure learning in action , 2010, Behavioural Brain Research.

[14] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[15] Neil C. Rabinowitz. Meta-learners' learning dynamics are unlike learners' , 2019, ArXiv.

[16] Pascal Vincent,et al. Visualizing Higher-Layer Features of a Deep Network , 2009 .

[17] Eitan M. Gurari,et al. Introduction to the theory of computation , 1989 .

[18] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[19] Surya Ganguli,et al. Universality and individuality in neural dynamics across large populations of recurrent networks , 2019, NeurIPS.

[20] Henry Markram,et al. On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[21] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[22] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[23] Surya Ganguli,et al. Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics , 2019, NeurIPS.

[24] James P. Crutchfield,et al. Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[25] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[26] R. Weale. Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[27] John F. Kolen,et al. Field Guide to Dynamical Recurrent Networks , 2001 .

[28] Bolei Zhou,et al. Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[29] Leslie Pack Kaelbling,et al. A Situated View of Representation and Control , 1995, Artif. Intell..

[30] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[31] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[32] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[33] Alan Fern,et al. Learning Finite State Representations of Recurrent Policy Networks , 2018, ICLR.

[34] Patrick J. F. Groenen,et al. Modern Multidimensional Scaling: Theory and Applications , 2003 .

[35] Jos C. M. Baeten,et al. Concurrency Theory: A Historical Perspective on Coinduction and Process Calculi , 2014, Computational Logic.

[36] Abhishek Das,et al. Grad-CAM: Why did you say that? , 2016, ArXiv.

[37] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.

[38] Stefan Zohren,et al. Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[39] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41] Hansem Sohn,et al. Bayesian Computation through Cortical Latent Dynamics , 2018, Neuron.

[42] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[43] Anna Shcherbina,et al. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[44] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[45] Christel Baier,et al. Principles of model checking , 2008 .

[46] Yee Whye Teh,et al. Meta-learning of Sequential Strategies , 2019, ArXiv.

[47] T. Poggio,et al. BOOK REVIEW David Marr’s Vision: ﬂoreat computational neuroscience VISION: A COMPUTATIONAL INVESTIGATION INTO THE HUMAN REPRESENTATION AND PROCESSING OF VISUAL INFORMATION , 2009 .

[48] Leslie Pack Kaelbling,et al. Action and planning in embedded agents , 1990, Robotics Auton. Syst..

[49] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[50] Nassir Navab,et al. Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51] Tsendsuren Munkhdalai,et al. Metalearned Neural Memory , 2019, NeurIPS.

[52] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[53] Nick Cammarata,et al. An Overview of Early Vision in InceptionV1 , 2020 .

[54] Samy Bengio,et al. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2020, ICLR.

[55] Uri Shalit,et al. Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[56] Samuel J. Gershman,et al. A theory of learning to infer , 2019, bioRxiv.

[57] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .

[58] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[59] Surya Ganguli,et al. From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction , 2019, NeurIPS.

[60] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[61] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[62] Niru Maheswaranathan,et al. How recurrent networks implement contextual processing in sentiment analysis , 2020, ICML.

[63] Howard Raiffa,et al. Applied Statistical Decision Theory. , 1961 .

[64] Joelle Pineau,et al. Learning Causal State Representations of Partially Observable Environments , 2019, ArXiv.

[65] Daniel A. Braun,et al. Motor Task Variation Induces Structural Learning , 2009, Current Biology.