暂无分享,去创建一个
Tom Schaul | Doina Precup | Pierre-Luc Bacon | Jean Harb | T. Schaul | Doina Precup | Pierre-Luc Bacon | J. Harb
[1] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[2] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.
[3] Xi-Ren Cao,et al. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization , 1998, IEEE Trans. Control. Syst. Technol..
[4] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[5] A. Keane,et al. Evolutionary Optimization of Computationally Expensive Problems via Surrogate Modeling , 2003 .
[6] G. Box,et al. On the Experimental Attainment of Optimum Conditions , 1951 .
[7] Nicolas Le Roux,et al. The Value Function Polytope in Reinforcement Learning , 2019, ICML.
[8] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[9] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[10] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[11] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[12] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[13] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[14] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[15] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.
[16] Michèle Sebag,et al. Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy , 2012, GECCO '12.
[17] P. Glynn,et al. Likelihood Ratio Gradient Estimation for Steady-State Parameters , 2017, Stochastic Systems.
[18] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[19] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[20] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[21] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[24] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[25] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[26] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[27] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[28] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[29] Andrew W. Moore,et al. Memory-based Stochastic Optimization , 1995, NIPS.
[30] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[31] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[32] Shimon Whiteson,et al. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning , 2017, ICLR 2018.
[33] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[34] John E. Dennis,et al. Optimization Using Surrogate Objectives on a Helicopter Test Example , 1998 .
[35] Donald R. Jones,et al. A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..
[36] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[37] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[38] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[39] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[40] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[41] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.
[42] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[43] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[44] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..