Parameter-exploring policy gradients
-
爱吃猫的鱼0At Oct. 9, 2021, 12:37 a.m.
Frank Sehnke | Christian Osendorfer | Jürgen Schmidhuber | Alex Graves | Jan Peters | Thomas Rückstieß | J. Schmidhuber | Jan Peters | A. Graves | Thomas Rückstieß | Frank Sehnke | Christian Osendorfer
[1] Martin Lauer,et al. Making a Robot Learn to Play Soccer Using Reward and Punishment , 2007, KI.
[2] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[3] Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.
[4] James C. Spall,et al. AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .
[5] Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning , 2004, Machine Learning.
[6] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.
[7] J. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .
[8] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[9] Hans-Paul Schwefel,et al. Evolution and optimum seeking , 1995, Sixth-generation computer technology series.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[12] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[13] Michael I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .
[14] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[15] Stefan Schaal,et al. Reinforcement learning of motor skills with policy gradients , 2008, Neural Networks.
[16] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[17] Stefan Schaal,et al. Natural Actor-Critic , 2003, ECML.
[18] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[19] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.