Real-time reinforcement learning by sequential Actor-Critics and experience replay
暂无分享,去创建一个
[1] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[2] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[3] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.
[4] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[5] P. Wawrzynski,et al. Learning to Control a 6-Degree-of-Freedom Walking Robot , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".
[6] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[7] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.
[8] A. Pacut. Balanced Importance Sampling Estimation , 2006 .
[9] P. Wawrzynski. Balanced Importance Sampling Estimation , 2006 .
[10] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[16] S. Vijayakumar,et al. Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .
[17] K. Doya,et al. Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .
[18] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[19] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[20] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[21] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[22] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[23] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[24] Leonid Peshkin,et al. Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.
[25] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[26] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[27] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[28] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[29] Pawel Cichosz,et al. An Analysis of Experience Replay in Temporal Difference Learning , 1999, Cybern. Syst..
[30] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[31] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.
[32] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[33] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[34] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[35] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[36] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[37] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[38] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[39] David E. Orin,et al. Efficient Dynamic Computer Simulation of Robotic Mechanisms , 1982 .
[40] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[41] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.