暂无分享,去创建一个
David Silver | Tom Zahavy | Yannick Schroecker | Sebastian Flennerhag | Hado van Hasselt | Satinder Singh
[1] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[2] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[3] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[4] Geoffrey E. Hinton. Using fast weights to deblur old memories , 1987 .
[5] Wouter M. Koolen,et al. MetaGrad: Multiple Learning Rates in Online Learning , 2016, NIPS.
[6] Pieter Abbeel,et al. Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.
[7] Daniel Guo,et al. Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.
[8] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.
[9] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[10] Junier B. Oliva,et al. Meta-Curvature , 2019, NeurIPS.
[11] Chen Liang,et al. AutoML-Zero: Evolving Machine Learning Algorithms From Scratch , 2020, ICML.
[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[13] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[14] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[15] Razvan Pascanu,et al. Meta-Learning with Warped Gradient Descent , 2020, ICLR.
[16] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[17] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .
[18] Seungjin Choi,et al. Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace , 2018, ICML.
[19] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[20] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.
[21] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[22] Renjie Liao,et al. Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.
[23] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[24] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[25] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[26] Hugo Larochelle,et al. Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.
[27] Massimiliano Pontil,et al. Online-Within-Online Meta-Learning , 2019, NeurIPS.
[28] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[29] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[30] Leslie Pack Kaelbling,et al. Meta-learning curiosity algorithms , 2020, ICLR.
[31] Thomas L. Griffiths,et al. Reconciling meta-learning and continual learning with online mixtures of tasks , 2018, NeurIPS.
[32] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[33] Yoshua Bengio,et al. Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[34] Amos J. Storkey,et al. How to train your MAML , 2018, ICLR.
[35] Mohammad Ghavamzadeh,et al. Mirror Descent Policy Optimization , 2020, ArXiv.
[36] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[37] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.
[38] Karen Simonyan,et al. Off-Policy Actor-Critic with Shared Experience Replay , 2020, ICML.
[39] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[40] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[41] Junhyuk Oh,et al. Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.
[42] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[43] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[44] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[45] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[46] Jian Li,et al. Learning Gradient Descent: Better Generalization and Longer Horizons , 2017, ICML.
[47] Neil D. Lawrence,et al. Transferring Knowledge across Learning Processes , 2018, ICLR.
[48] Jeremy Nixon,et al. Understanding and correcting pathologies in the training of learned optimizers , 2018, ICML.
[49] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.
[50] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.
[51] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[52] Junhyuk Oh,et al. A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.
[53] Katherine D. Kinzler,et al. Core knowledge. , 2007, Developmental science.
[54] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[55] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[56] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[57] Misha Denil,et al. Learning to Learn for Global Optimization of Black Box Functions , 2016, ArXiv.
[58] Maria-Florina Balcan,et al. Adaptive Gradient-Based Meta-Learning Methods , 2019, NeurIPS.
[59] Katja Hofmann,et al. Fast Context Adaptation via Meta-Learning , 2018, ICML.
[60] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[61] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[62] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[63] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[64] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[65] Tianlong Chen,et al. Learning to Optimize in Swarms , 2019, NeurIPS.
[66] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[67] Chelsea Finn,et al. Meta-Learning without Memorization , 2020, ICLR.
[68] Maria-Florina Balcan,et al. Provable Guarantees for Gradient-Based Meta-Learning , 2019, ICML.
[69] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .