Active online learning of the bipedal walking

For legged robot walking pattern learning, the current mainstream and state-of-the-art researches are most under a so-called computer simulation based framework, where the walking pattern is learned via a pre-established simulation platform. However, when the learned walking pattern is applied to a real robot, an additional adapting procedure is always required, due to the big difference between simulation and real walking circumstances. This turns out to be more critical for a bipedal walking, because its controlling is more difficult than others, such as quadruped robot. In this paper, a novel framework for active online learning bipedal walking directly on a physical robot is proposed. To let the learning procedure to be of both fast convergence and high efficiency, a polynomial response surrogate model, an orthogonal experimental design based active learning strategy as well as a gradient ascent algorithm are used. The experimental results on a real humanoid robot PKU-HR3 show its effectiveness, indicating that the proposed learning framework is a promising alternative for bipedal walking pattern learning.

[1]  Kevin Tucker,et al.  Response surface approximation of pareto optimal front in multi-objective optimization , 2004 .

[2]  Dirk Thomas,et al.  Versatile, High-Quality Motions and Behavior Control of a Humanoid Soccer Robot , 2008, Int. J. Humanoid Robotics.

[3]  Claude Sammut,et al.  Omnidirectional Locomotion for Quadruped Robots , 2001, RoboCup.

[4]  Stephan K. Chalup,et al.  Techniques for Improving Vision and Locomotion on the Sony AIBO Robot , 2003 .

[5]  Cord Niehaus,et al.  Gait Optimization on a Humanoid Robot using Particle Swarm Optimization , 2007 .

[6]  Oskar von Stryk,et al.  International Journal of Robotics Research , 2022 .

[7]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[8]  Helio J. C. Barbosa,et al.  A similarity-based surrogate model for expensive evolutionary optimization with fixed budget of simulations , 2009, 2009 IEEE Congress on Evolutionary Computation.

[9]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[10]  Sven Behnke,et al.  Stochastic optimization of bipedal walking using gyro feedback and phase resetting , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[11]  Jerry E. Pratt,et al.  Virtual model control of a bipedal walking robot , 1997, Proceedings of International Conference on Robotics and Automation.

[12]  Michael A. Saunders,et al.  USER’S GUIDE FOR SNOPT 5.3: A FORTRAN PACKAGE FOR LARGE-SCALE NONLINEAR PROGRAMMING , 2002 .

[13]  Ivan N. Vuchkov,et al.  Quality Improvement with Design of Experiments , 2001 .

[14]  Ivan N. Vuchkov,et al.  Quality Improvement with Design of Experiments: A Response Surface Approach , 2001 .

[15]  Wang Hao,et al.  Modified Sequential Kriging Optimization for Multidisciplinary Complex Product Simulation , 2010 .

[16]  William T. B. Uther,et al.  Automatic Gait Optimisation for Quadruped Robots , 2003 .

[17]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[18]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[19]  Andres F. Hernandez,et al.  An exploratory study of discrete time state-space models using kriging , 2008, 2008 American Control Conference.

[20]  M. Vukobratovic,et al.  Contribution to the Synthesis of Biped Gait , 1968 .

[21]  Stefan Schaal,et al.  Learning Policy Improvements with Path Integrals , 2010, AISTATS.

[22]  Hiroshi Shimizu,et al.  Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment , 1991, Biological Cybernetics.

[23]  Gentaro Taga,et al.  A model of the neuro-musculo-skeletal system for anticipatory adjustment of human locomotion during obstacle avoidance , 1998, Biological Cybernetics.

[24]  Denis Fisseler,et al.  Learning in a High Dimensional Space: Fast Omnidirectional Quadrupedal Locomotion , 2006, RoboCup.

[25]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[26]  Hooshang Hemami,et al.  Some aspects of the inverted pendulum problem for modeling of locomotion systems , 1973 .