Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures

Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage.

[1]  J. Shynk Adaptive IIR filtering , 1989, IEEE ASSP Magazine.

[2]  P. Regalia Adaptive IIR Filtering in Signal Processing and Control , 1994 .

[3]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[4]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[5]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[9]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[10]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[11]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[12]  Jan Peters,et al.  Machine Learning for motor skills in robotics , 2008, Künstliche Intell..

[13]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[14]  K. Geihs,et al.  Carpe Noctem 2009 , 2009 .