论文信息 - Parameter-exploring policy gradients - 字舞流文

Parameter-exploring policy gradients

Frank Sehnke | Christian Osendorfer | Jürgen Schmidhuber | Alex Graves | Jan Peters | Thomas Rückstieß | J. Schmidhuber | Jan Peters | A. Graves | Thomas Rückstieß | Frank Sehnke | Christian Osendorfer | Alex Graves

[1] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[2] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.

[3] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[4] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[5] Martin Lauer,et al. Making a Robot Learn to Play Soccer Using Reward and Punishment , 2007, KI.

[6] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[7] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[9] Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.

[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[12] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[13] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15] J. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[16] James C. Spall,et al. AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .

[17] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[18] Hans-Paul Schwefel,et al. Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[19] Michael I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .