论文信息 - Meta-learning of Sequential Strategies - 字舞流文

Meta-learning of Sequential Strategies

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.

Yee Whye Teh | Razvan Pascanu | Nando de Freitas | Shane Legg | Kevin Miller | Joel Veness | András György | Simon Osindero | Nicolas Heess | Ian Osband | Alexander Pritzel | Silvia Chiappa | Matthew Botvinick | Pedro A. Ortega | Mohammad Gheshlaghi Azar | Hado van Hasselt | Pablo Sprechmann | Mark Rowland | Tim Genewein | Kevin J. Miller | Siddhant M. Jayakumar | Jane X. Wang | Zeb Kurth-Nelson | Tom McGrath | Neil C. Rabinowitz | Jane X. Wang | N. Heess | A. György | J. Veness | S. Legg | Simon Osindero | Y. Teh | Mark Rowland | Ian Osband | A. Pritzel | H. V. Hasselt | M. G. Azar | N. D. Freitas | M. Botvinick | Razvan Pascanu | Z. Kurth-Nelson | Tim Genewein | Zeb Kurth-Nelson | P. Sprechmann | Tom McGrath | S. Chiappa | M. Rowland

[1] Wm. R. Wright. General Intelligence, Objectively Determined and Measured. , 1905 .

[2] W. Potts. TESTS OF INTELLIGENCE * , 1912, British medical journal.

[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[4] B. O. Koopman. On distributions admitting a sufficient statistic , 1936 .

[5] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[6] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[7] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[8] Michael A. Arbib,et al. Algebraic theory of machines, languages and semigroups , 1969 .

[9] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[10] D. Mitra,et al. Convergence and finite-time behavior of simulated annealing , 1985, 1985 24th IEEE Conference on Decision and Control.

[11] E. B. Andersen,et al. Information Science and Statistics , 1986 .

[12] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[13] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[14] John F. Kolen,et al. Fool's Gold: Extracting Finite State Machines from Recurrent Network Dynamics , 1993, NIPS.

[15] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[16] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[17] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[18] Michael Sipser,et al. Introduction to the Theory of Computation , 1996, SIGA.

[19] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .

[20] L. Gottfredson. Mainstream science on intelligence: An editorial with 52 signatories, history, and bibliography , 1997 .

[21] Michael I. Jordan. Serial Order: A Parallel Distributed Processing Approach , 1997 .

[22] R. McKelvey,et al. Quantal Response Equilibria for Extensive Form Games , 1998 .

[23] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[24] Jonathan Baxter,et al. Theoretical Models of Learning to Learn , 1998, Learning to Learn.

[25] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .

[26] Yaghout Nourani,et al. A comparison of simulated annealing cooling strategies , 1998 .

[27] A. Dawid,et al. Prequential probability: principles and properties , 1999 .

[28] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[29] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[30] John F. Kolen,et al. Field Guide to Dynamical Recurrent Networks , 2001 .

[31] A. Kaufman. Tests of intelligence , 2004 .

[32] Marcus Hutter. Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[33] M. Tribus,et al. Probability theory: the logic of science , 2003 .

[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[36] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[37] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[38] G. Evans,et al. Learning to Optimize , 2008 .

[39] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[40] Itamar Arel,et al. Beyond the Turing Test , 2009, Computer.

[41] Hilbert J. Kappen,et al. Risk Sensitive Path Integral Control , 2010, UAI.

[42] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..

[43] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[44] S. Urbina. The Cambridge Handbook of Intelligence: Tests of Intelligence , 2011 .

[45] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.

[46] Alex Graves,et al. Supervised Sequence Labelling , 2012 .

[47] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[48] Y. Ritov,et al. Statistical Theory: A Concise Introduction , 2013 .

[49] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[50] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[52] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[53] OctoMiao. Overcoming catastrophic forgetting in neural networks , 2016 .

[54] J. Schreiber. Foundations Of Statistics , 2016 .

[55] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[56] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[57] Misha Denil,et al. Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.

[58] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[59] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[60] Noah D. Goodman,et al. Deep Amortized Inference for Probabilistic Programs , 2016, ArXiv.

[61] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.

[62] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[63] Benjamin Van Roy,et al. Posterior Sampling for Reinforcement Learning Without Episodes , 2016, ArXiv.

[64] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[65] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[66] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.

[67] Misha Denil,et al. Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.

[68] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.

[69] Wei Ji Ma,et al. Efficient Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback , 2018 .

[70] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[71] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[72] Nando de Freitas,et al. Robust Imitation of Diverse Behaviors , 2017, NIPS.

[73] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.

[74] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[75] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[76] Murray Shanahan,et al. Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[77] C. Papadimitriou,et al. Introduction to the Theory of Computation , 2018 .

[78] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .

[79] Katja Hofmann,et al. CAML: Fast Context Adaptation via Meta-Learning , 2018, ArXiv.

[80] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[81] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[82] Yee Whye Teh,et al. Conditional Neural Processes , 2018, ICML.

[83] Shane Legg,et al. Modeling Friends and Foes , 2018, ArXiv.

[84] J. Schmidhuber. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environm~nts , 2018 .

[85] Heiga Zen,et al. Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.

[86] Katja Hofmann,et al. Fast Context Adaptation via Meta-Learning , 2018, ICML.

[87] Sergey Levine,et al. Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[88] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[89] Sebastian Nowozin,et al. Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[90] Zeb Kurth-Nelson,et al. Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.