Parameter-exploring policy gradients

[1]  Martin Lauer,et al.  Making a Robot Learn to Play Soccer Using Reward and Punishment , 2007, KI.

[2]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[3]  Nicol N. Schraudolph,et al.  Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.

[4]  James C. Spall,et al.  AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .

[5]  Ronald J. Williams Simple statistical gradient-following algorithms for connectionist reinforcement learning , 2004, Machine Learning.

[6]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[7]  J. Spall Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[8]  Tom Schaul,et al.  Fitness Expectation Maximization , 2008, PPSN.

[9]  Hans-Paul Schwefel,et al.  Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[10]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[11]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[12]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[14]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[15]  Stefan Schaal,et al.  Reinforcement learning of motor skills with policy gradients , 2008, Neural Networks.

[16]  Rémi Munos,et al.  Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[17]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, ECML.

[18]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[19]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.