Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
 V. Gullapalli,et al. Associative reinforcement learning of real-valued functions , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.
 S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.
 Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
 Vijay Balasubramanian,et al. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.
 Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
 R. Fletcher,et al. Practical Methods of Optimization: Fletcher/Practical Methods of Optimization , 2000 .
 Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
 J. Baxter,et al. Direct gradient-based reinforcement learning , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
 Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
 A. Berny,et al. Statistical machine learning and combinatorial optimization , 2001 .
 STOCHASTIC OPTIMIZATION , 2002 .
 Jun Nakanishi,et al. Learning rhythmic movements by demonstration using nonlinear oscillators , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.
 Noah J. Cowan,et al. Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.
 Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
 Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.
 Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
 Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning , 2004, Machine Learning.
 Shin Ishii,et al. Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot , 2004, PPSN.
 Takayuki Kanda,et al. Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
 Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.
 Florentin Wörgötter,et al. Fast Biped Walking with a Sensor-driven Neuronal Controller and Real-time Online Learning , 2006, Int. J. Robotics Res..
 P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .