暂无分享,去创建一个
[1] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[2] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[4] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[5] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[6] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[7] Arash Tavakoli,et al. Action Branching Architectures for Deep Reinforcement Learning , 2017, AAAI.
[8] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[9] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[10] Marek Petrik,et al. Biasing Approximate Dynamic Programming with a Lower Discount Factor , 2008, NIPS.
[11] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[12] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[13] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[14] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[15] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[16] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[17] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[18] Dale Schuurmans,et al. Smoothed Action Value Functions for Learning Gaussian Policies , 2018, ICML.
[19] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[23] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[24] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[25] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[27] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[28] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[29] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[30] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[31] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[32] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[33] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[34] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[35] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.
[36] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[37] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[38] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[39] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[40] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[41] R. Mazo. On the theory of brownian motion , 1973 .
[42] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[43] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[44] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[45] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[46] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[47] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[48] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[49] Mark D. Pendrith,et al. Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions , 1997 .