Reinforcement Learning for Humanoid Robots - Policy Gradients and Beyond
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[3] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[4] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[7] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[8] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[9] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[10] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[11] J. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes: implementation issues , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[12] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[13] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[14] Karsten Berns,et al. Adaptive biologically inspired control for the four-legged walking machine BISAM , 1999 .
[15] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[16] T. Moon,et al. Mathematical Methods and Algorithms for Signal Processing , 1999 .
[17] Chaouki T. Abdallah,et al. Linear Quadratic Control: An Introduction , 2000 .
[18] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[19] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[20] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[21] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.
[22] Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[24] Stefan Schaal,et al. Forward models in visuomotor control. , 2002, Journal of neurophysiology.
[25] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[26] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[27] Richard S. Sutton,et al. Reinforcement Learning , 1992, Handbook of Machine Learning.