论文信息 - An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm

An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm

Recently, actor-critic methods have drawn much interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. This paper studies an actor-critic type algorithm utilizing the RLS(recursive least-squares) method, which is one of the most efficient techniques for adaptive signal processing, together with natural policy gradient. In the actor part of the studied algorithm, we follow the strategy of performing parameter update via the natural gradient method, while in its update for the critic part, the recursive least-squares method is employed in order to make the parameter estimation for the value functions more efficient. The studied algorithm was applied to locomotion of a two-linked robot arm, and showed better performance compared to the conventional stochastic gradient ascent algorithm.

[1] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[2] T. Moon,et al. Mathematical Methods and Algorithms for Signal Processing , 1999 .

[3] H. He,et al. Efficient Reinforcement Learning Using Recursive Least-Squares Methods , 2011, J. Artif. Intell. Res..

[4] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[5] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[6] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[10] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.