Bootstrapped Meta-Learning

Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem that often exhibits ill-conditioning, and myopic metaobjectives. We propose an algorithm that tackles these issues by letting the metalearner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under a chosen (pseudo-)metric. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization, for instance by reducing ill-conditioning. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in an ε-greedy Q-learning agent.

[1]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[2]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[3]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[4]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[5]  Wouter M. Koolen,et al.  MetaGrad: Multiple Learning Rates in Online Learning , 2016, NIPS.

[6]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[7]  Daniel Guo,et al.  Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.

[8]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[9]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[10]  Junier B. Oliva,et al.  Meta-Curvature , 2019, NeurIPS.

[11]  Chen Liang,et al.  AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[14]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[15]  Razvan Pascanu,et al.  Meta-Learning with Warped Gradient Descent , 2020, ICLR.

[16]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[17]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[18]  Seungjin Choi,et al.  Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.

[19]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[20]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[21]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[22]  Renjie Liao,et al.  Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.

[23]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[24]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[25]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[26]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[27]  Massimiliano Pontil,et al.  Online-Within-Online Meta-Learning , 2019, NeurIPS.

[28]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[29]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[30]  Leslie Pack Kaelbling,et al.  Meta-learning curiosity algorithms , 2020, ICLR.

[31]  Thomas L. Griffiths,et al.  Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[34]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[35]  Mohammad Ghavamzadeh,et al.  Mirror Descent Policy Optimization , 2020, ArXiv.

[36]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[37]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[38]  Karen Simonyan,et al.  Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.

[39]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[40]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[41]  Junhyuk Oh,et al.  Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[42]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[43]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[44]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[45]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[46]  Jian Li,et al.  Learning Gradient Descent: Better Generalization and Longer Horizons , 2017, ICML.

[47]  Neil D. Lawrence,et al.  Transferring Knowledge across Learning Processes , 2018, ICLR.

[48]  Jeremy Nixon,et al.  Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.

[49]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[50]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[51]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[52]  Junhyuk Oh,et al.  A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.

[53]  Katherine D. Kinzler,et al.  Core knowledge. , 2007, Developmental science.

[54]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[55]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[56]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[57]  Misha Denil,et al.  Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.

[58]  Maria-Florina Balcan,et al.  Adaptive Gradient-Based Meta-Learning Methods , 2019, NeurIPS.

[59]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[60]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[61]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[62]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[63]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[64]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[65]  Tianlong Chen,et al.  Learning to Optimize in Swarms , 2019, NeurIPS.

[66]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[67]  Chelsea Finn,et al.  Meta-Learning without Memorization , 2020, ICLR.

[68]  Maria-Florina Balcan,et al.  Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.

[69]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .