Policy Gradient Methods for Robotics
暂无分享,去创建一个
[1] M. Ciletti,et al. The computation and theory of optimal control , 1972 .
[2] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[3] L. Hasdorff. Gradient Optimization and Nonlinear Control , 1976 .
[4] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[5] R. Fletcher. Practical Methods of Optimization , 1988 .
[6] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[7] Peter W. Glynn,et al. Gradient estimation for ratios , 1991, 1991 Winter Simulation Conference Proceedings..
[8] V. Gullapalli,et al. Associative reinforcement learning of real-valued functions , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.
[9] Oliver G. Selfridge,et al. Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[10] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[11] S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.
[12] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[13] Vijay Balasubramanian,et al. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.
[14] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[15] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[16] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[17] R. Fletcher,et al. Practical Methods of Optimization: Fletcher/Practical Methods of Optimization , 2000 .
[18] Jon Rigelsford,et al. Modelling and Control of Robot Manipulators , 2000 .
[19] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[20] J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[21] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[22] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[23] A. Berny,et al. Statistical machine learning and combinatorial optimization , 2001 .
[24] Shin Ishii,et al. Reinforcement Learning for Biped Locomotion , 2002, ICANN.
[25] Alison L Gibbs,et al. On Choosing and Bounding Probability Metrics , 2002, math/0209021.
[26] J. Spall. STOCHASTIC OPTIMIZATION , 2002 .
[27] Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.
[28] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[29] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[30] Noah J. Cowan,et al. Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.
[31] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[32] Shin Ishii,et al. Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.
[33] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.
[34] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[35] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[36] Shin Ishii,et al. Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot , 2004, PPSN.
[37] Takayuki Kanda,et al. Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[38] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[39] Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.
[40] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[41] Florentin Wörgötter,et al. Fast Biped Walking with a Sensor-driven Neuronal Controller and Real-time Online Learning , 2006, Int. J. Robotics Res..
[42] James C. Spall,et al. Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .
[43] Jan Peters,et al. Machine Learning for motor skills in robotics , 2008, Künstliche Intell..
[44] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .