Real-time reinforcement learning by sequential Actor-Critics and experience replay

[1]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[2]  Shalabh Bhatnagar,et al.  Natural actorcritic algorithms. , 2009 .

[3]  Shie Mannor,et al.  Reinforcement learning in the presence of rare events , 2008, ICML '08.

[4]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[5]  P. Wawrzynski,et al.  Learning to Control a 6-Degree-of-Freedom Walking Robot , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".

[6]  Shalabh Bhatnagar,et al.  Incremental Natural Actor-Critic Algorithms , 2007, NIPS.

[7]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[8]  A. Pacut Balanced Importance Sampling Estimation , 2006 .

[9]  P. Wawrzynski Balanced Importance Sampling Estimation , 2006 .

[10]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[11]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[16]  S. Vijayakumar,et al.  Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[17]  K. Doya,et al.  Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[18]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[19]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[20]  Dimitri P. Bertsekas,et al.  Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[21]  Leonid Peshkin,et al.  Learning from Scarce Experience , 2002, ICML.

[22]  Christian R. Shelton,et al.  Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.

[23]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[24]  Leonid Peshkin,et al.  Bounds on Sample Size for Policy Evaluation in Markov Environments , 2001, COLT/EuroCOLT.

[25]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[26]  P. Bartlett,et al.  Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[27]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[28]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[29]  Pawel Cichosz,et al.  An Analysis of Experience Replay in Temporal Difference Learning , 1999, Cybern. Syst..

[30]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[31]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[32]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[33]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[34]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[35]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[36]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[37]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[38]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  David E. Orin,et al.  Efficient Dynamic Computer Simulation of Robotic Mechanisms , 1982 .

[40]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[41]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.