论文信息 - Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures

Enhancing the episodic natural actor-critic algorithm by a regularisation term to stabilize learning of control structures

Incomplete or imprecise models of control systems make it difficult to find an appropriate structure and parameter set for a corresponding control policy. These problems are addressed by reinforcement learning algorithms like policy gradient methods. We describe how to stabilise the policy gradient descent by introducing a regularisation term to enhance the episodic natural actor-critic approach. This allows a more policy independent usage.

[1] J. Shynk. Adaptive IIR filtering , 1989, IEEE ASSP Magazine.

[2] P. Regalia. Adaptive IIR Filtering in Signal Processing and Control , 1994 .

[3] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .

[4] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[5] Herbert Jaeger,et al. The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[6] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[9] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[10] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[11] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[12] Jan Peters,et al. Machine Learning for motor skills in robotics , 2008, Künstliche Intell..

[13] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[14] K. Geihs,et al. Carpe Noctem 2009 , 2009 .