暂无分享,去创建一个
Koray Kavukcuoglu | Brendan O'Donoghue | Volodymyr Mnih | Rémi Munos | K. Kavukcuoglu | R. Munos | Volodymyr Mnih | Brendan O'Donoghue
[1] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[2] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[8] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[12] Naoki Abe,et al. Sequential cost-sensitive decision making with reinforcement learning , 2002, KDD.
[13] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[14] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[15] Geoffrey E. Hinton,et al. Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..
[16] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[17] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[18] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[21] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[22] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[23] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[24] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[25] Yee Whye Teh,et al. Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.
[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[27] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[28] Tzuu-Hseng S. Li,et al. Backward Q-learning: The combination of Sarsa algorithm and Q-learning , 2013, Eng. Appl. Artif. Intell..
[29] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[30] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[31] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[32] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[33] Doina Precup,et al. Policy Gradient Methods for Off-policy Control , 2015, ArXiv.
[34] Peter Kulchyski. and , 2015 .
[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[36] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[37] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[38] Matthew Hausknecht and Peter Stone. On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning , 2016 .
[39] Dale Schuurmans,et al. Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.
[40] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[41] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[42] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[44] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[45] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[46] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[47] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[48] Omer Levy,et al. Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .