Policy Learning - A Unified Perspective with Applications in Robotics

Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, natural-gradient algorithms and EM-like policy learning. Secondly, we present several applications to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several different real robots are shown.

[1]  Shin Ishii,et al.  Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[2]  Takayuki Kanda,et al.  Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Jin Yu,et al.  Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.

[4]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[5]  Florentin Wörgötter,et al.  Fast Biped Walking with a Sensor-driven Neuronal Controller and Real-time Online Learning , 2006, Int. J. Robotics Res..

[6]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[7]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[8]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[9]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[10]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[11]  Shin Ishii,et al.  Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[12]  Jun Morimoto,et al.  Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.

[13]  Stefan Schaal,et al.  Learning Operational Space Control , 2006, Robotics: Science and Systems.

[14]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[15]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[16]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[17]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[18]  Shigenobu Kobayashi,et al.  Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .

[19]  David C. Sterratt,et al.  Does Morphology Influence Temporal Plasticity? , 2002, ICANN.

[20]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[21]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[22]  Douglas Aberdeen,et al.  POMDPs and Policy Gradients , 2006 .

[23]  Jan Peters,et al.  Reinforcement Learning of Perceptual Coupling for Motor Primitives , 2008, EWRL 2008.

[24]  Oliver G. Selfridge,et al.  Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.