论文信息 - Real-time reinforcement learning by sequential Actor-Critics and experience replay - 字舞流文

Real-time reinforcement learning by sequential Actor-Critics and experience replay

Pawel Wawrzynski | P. Wawrzynski | Pawel Wawrzynski

[1] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..

[2] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .

[3] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.

[4] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[5] P. Wawrzynski,et al. Learning to Control a 6-Degree-of-Freedom Walking Robot , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".

[6] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[7] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.

[8] A. Pacut. Balanced Importance Sampling Estimation , 2006 .

[9] P. Wawrzynski. Balanced Importance Sampling Estimation , 2006 .

[10] Pieter Abbeel,et al. Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[15] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[16] S. Vijayakumar,et al. Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[17] K. Doya,et al. Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[18] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[19] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[20] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[21] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.

[22] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.

[23] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[24] Leonid Peshkin,et al. Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.

[25] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[26] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[27] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[28] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[29] Pawel Cichosz,et al. An Analysis of Experience Replay in Temporal Difference Learning , 1999, Cybern. Syst..

[30] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.

[31] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[32] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[33] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[34] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[35] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[36] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[37] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[38] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[39] David E. Orin,et al. Efficient Dynamic Computer Simulation of Robotic Mechanisms , 1982 .

[40] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[41] Carlos S. Kubrusly,et al. Stochastic approximation algorithms and applications , 1973, CDC 1973.