论文信息 - Improving Generalization in Meta Reinforcement Learning using Neural Objectives

Improving Generalization in Meta Reinforcement Learning using Neural Objectives

Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta-reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that affects how future individuals will learn. Unlike recent meta-RL algorithms, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. In some cases, it even outperforms human-engineered RL algorithms. MetaGenRL uses off-policy second-order gradients during meta-training that greatly increase its sample efficiency.

Louis Kirsch | Sjoerd van Steenkiste | Juergen Schmidhuber | J. Schmidhuber | Louis Kirsch

[1] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2] Jieyu Zhao,et al. Direct Policy Search and Uncertain Policy Evaluation , 1998 .

[3] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[4] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.

[5] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[6] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[8] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[9] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[10] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .

[11] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[12] Jascha Sohl-Dickstein,et al. Learning Unsupervised Learning Rules , 2018, ArXiv.

[13] Marcin Andrychowicz,et al. Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[14] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[15] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[16] Jeff Clune,et al. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence , 2019, ArXiv.

[17] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[18] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[19] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[20] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[21] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[22] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.

[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[29] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[30] Jitendra Malik,et al. Learning to Optimize Neural Nets , 2017, ArXiv.

[31] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[32] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[33] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[34] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[35] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[36] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[37] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[38] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[39] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[40] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[41] John Schulman,et al. Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[42] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.