Reinforcement Learning for CPG-Driven Biped Robot

Animal's rhythmic movements such as locomotion are considered to be controlled by neural circuits called central pattern generators (CPGs). This article presents a reinforcement learning (RL) method for a CPG controller, which is inspired by the control mechanism of animals. Because the CPG controller is an instance of recurrent neural networks, a naive application of RL involves difficulties. In addition, since state and action spaces of controlled systems are very large in real problems such as robot control, the learning of the value function is also difficult. In this study, we propose a learning scheme for a CPG controller called a CPG-actor-critic model, whose learning algorithm is based on a policy gradient method. We apply our RL method to autonomous acquisition of biped locomotion by a biped robot simulator. Computer simulations show our method is able to train a CPG controller such that the learning process is stable.

[1]  S. Grillner,et al.  Neuronal network generating locomotor behavior in lamprey: circuitry, transmitters, membrane properties, and simulation. , 1991, Annual review of neuroscience.

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Andrew G. Barto,et al.  Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.

[4]  D. Barnes,et al.  Hexapodal robot locomotion over uneven terrain , 1998, Proceedings of the 1998 IEEE International Conference on Control Applications (Cat. No.98CH36104).

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  T. Takenaka,et al.  The development of Honda humanoid robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[7]  Shin Ishii,et al.  Reinforcement Learning Based on On-Line EM Algorithm , 1998, NIPS.

[8]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[9]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[10]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[11]  Nobutoshi Yamazaki,et al.  Generation of human bipedal locomotion by a bio-mimetic neuro-musculo-skeletal model , 2001, Biological Cybernetics.

[12]  Kunikatsu Takase,et al.  Adaptive Dynamic Walking of a Quadruped Robot on Irregular Terrain Using a Neural System Model , 2000, ISRR.

[13]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[14]  Auke Jan Ijspeert,et al.  A connectionist central pattern generator for the aquatic and terrestrial gaits of a simulated salamander , 2001, Biological Cybernetics.

[15]  Yasuhiro Fukuoka,et al.  Adaptive Dynamic Walking of a Quadruped Robot on Irregular Terrain Based on Biological Concepts , 2003, Int. J. Robotics Res..

[16]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[17]  Hiroshi Shimizu,et al.  Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment , 1991, Biological Cybernetics.

[18]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  Masa-aki Sato A real time learning algorithm for recurrent analog neural networks , 2004, Biological Cybernetics.

[20]  Örjan Ekeberg,et al.  A combined neuronal and mechanical model of fish swimming , 1993, Biological Cybernetics.

[21]  Örjan Ekeberg,et al.  A combined neuronal and mechanical model of fish swimming , 2005, Biological Cybernetics.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.