Biped dynamic walking using reinforcement learning

This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The self scaling reinforcement (SSR) learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architecture was developed in order to solve complex control problems. It uses different modules that consist of simple controllers and small neural networks. The architecture allows for easy incorporation of new modules that represent new knowledge, or new requirements for the desired task.

[1]  H. Wang,et al.  A neuromorphic controller for a three-link biped robot , 1989, International 1989 Joint Conference on Neural Networks.

[2]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[3]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[4]  Shinzo Kitamura,et al.  Autonomous trajectory generation of a biped locomotive robot , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[5]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[6]  Oliver G. Selfridge,et al.  Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[7]  Akihito Sano,et al.  Realization of natural dynamic walking using the angular momentum information , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[8]  S. Grillner Some Aspects on the Descending Control of the Spinal Circuits Generating Locomotor Movements , 1976 .

[9]  J. A. Franklin,et al.  Refinement of robot motor skills through reinforcement learning , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[10]  Andrew G. Barto,et al.  Reinforcement Learning and Dynamic Programming , 1995 .

[11]  Michael I. Jordan Constrained supervised learning , 1992 .

[12]  Yuan F. Zheng,et al.  Gait synthesis for the SD-2 biped robot to climb sloping surface , 1990, IEEE Trans. Robotics Autom..

[13]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[15]  James S. Albus,et al.  Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .

[16]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[17]  Jessica K. Hodgins,et al.  Biped Gymnastics , 1988, Int. J. Robotics Res..

[18]  Atsuo Takanishi,et al.  Realization of Plane Walking by the Biped Walking Robot WL-10R , 1985 .

[19]  Yuan F. Zheng A neural gait synthesizer for autonomous biped robots , 1990, EEE International Workshop on Intelligent Robots and Systems, Towards a New Frontier of Applications.

[20]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[21]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[22]  W. Thomas Miller,et al.  Real-time dynamic control of an industrial manipulator using a neural network-based learning controller , 1990, IEEE Trans. Robotics Autom..

[23]  Yuan F. Zheng,et al.  Control of the heel off to toe off motion of a dynamic biped gait , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[24]  W. T. Miller,et al.  CMAC: an associative neural network alternative to backpropagation , 1990 .

[25]  Marc H. Raibert,et al.  Hopping in legged systems — Modeling and simulation for the two-dimensional one-legged case , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Judy A. Franklin,et al.  Historical perspective and state of the art in connectionist learning control , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[27]  Stefan Petrás Learning systems of automatic control , 1966, Kybernetika.

[28]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[29]  Eugene C. Freuder,et al.  Partial Constraint Satisfaction , 1992, Artif. Intell..

[30]  Shuuji Kajita,et al.  Dynamic walking control of a biped robot along a potential energy conserving orbit , 1992, IEEE Trans. Robotics Autom..

[31]  William A. Gruver,et al.  Optimization of the biped robot trajectory , 1991, Conference Proceedings 1991 IEEE International Conference on Systems, Man, and Cybernetics.

[32]  William A. Gruver,et al.  Control of a biped robot in the double-support phase , 1992, IEEE Trans. Syst. Man Cybern..

[33]  J. Sklansky,et al.  Learning systems for automatic control , 1966 .

[34]  H. Hemami,et al.  Modeling of a Neural Pattern Generator with Coupled nonlinear Oscillators , 1987, IEEE Transactions on Biomedical Engineering.

[35]  Vijaykumar Gullapalli A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[36]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[37]  M.C. Mulder,et al.  A knowledge based control strategy for a biped , 1988, Proceedings. 1988 IEEE International Conference on Robotics and Automation.

[38]  Shinzo Kitamura,et al.  Motion generation of a biped locomotive robot using an inverted pendulum model and neural networks , 1990, 29th IEEE Conference on Decision and Control.

[39]  Yuan F. Zheng,et al.  Distal learning applied to biped robots , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[40]  Yuan F. Zheng,et al.  Recent Trends in Mobile Robots , 1994 .

[41]  W.T. Miller Real-time neural network control of a biped walking robot , 1994, IEEE Control Systems.

[42]  Miomir Vukobratovic,et al.  Decomposed connectionist architecture for fast and robust learning of robot dynamics , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[43]  Hiroshi Takeda,et al.  Learning Control of Finite Markov Chains , 1981 .

[44]  Akihiko Uchiyama,et al.  Information-Power Machine with Senses and Limbs , 1974 .

[45]  Tad McGeer Passive Dynamic Walking , 1990, Int. J. Robotics Res..

[46]  I. Shimoyama,et al.  Dynamic Walk of a Biped , 1984 .

[47]  W. Thomas Miller,et al.  A simulation study of bipedal walking robots: modeling, walking algorithms, and neural network control , 1992 .

[48]  Atsuo Takanishi,et al.  Realization of dynamic biped walking stabilized by trunk motion on a sagittally uneven surface , 1990, EEE International Workshop on Intelligent Robots and Systems, Towards a New Frontier of Applications.