Contextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller

We investigate learning of flexible robot locomotion controllers, i.e., the controllers should be applicable for multiple contexts, for example different walking speeds, various slopes of the terrain or other physical properties of the robot. In our experiments, contexts are desired walking linear speed of the gait. Current approaches for learning control parameters of biped locomotion controllers are typically only applicable for a single context. They can be used for a particular context, for example to learn a gait with highest speed, lowest energy consumption or a combination of both. The question of our research is, how can we obtain a flexible walking controller that controls the robot (near) optimally for many different contexts? We achieve the desired flexibility of the controller by applying the recently developed contextual relative entropy policy search(REPS) method which generalizes the robot walking controller for different contexts, where a context is described by a real valued vector. In this paper we also extend the contextual REPS algorithm to learn a non-linear policy instead of a linear policy over the contexts which call it RBF-REPS as it uses Radial Basis Functions. In order to validate our method, we perform three simulation experiments including a walking experiment using a simulated NAO humanoid robot. The robot learns a policy to choose the controller parameters for a continuous set of forward walking speeds.

[1]  Luís Paulo Reis,et al.  Omnidirectional Walking with a Compliant Inverted Pendulum Model , 2014, IBERAMIA.

[2]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[3]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[4]  Masayuki Inaba,et al.  A Fast Dynamically Equilibrated Walking Trajectory Generation Method of Humanoid Robot , 2002, Auton. Robots.

[5]  Manoj Srinivasan,et al.  Computer optimization of a minimal biped model discovers walking and running , 2006, Nature.

[6]  M. Vukobratovic,et al.  Biped Locomotion , 1990 .

[7]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[8]  Miomir Vukobratović,et al.  Biped Locomotion: Dynamics, Stability, Control and Application , 1990 .

[9]  Jie Yan,et al.  A Review of Gait Optimization Based on Evolutionary Computation , 2010, Appl. Comput. Intell. Soft Comput..

[10]  Cord Niehaus,et al.  Gait Optimization on a Humanoid Robot using Particle Swarm Optimization , 2007 .

[11]  Luís Paulo Reis,et al.  Learning to Walk Fast: Optimized Hip Height Movement for Simulated and Real Humanoid Robots , 2015, J. Intell. Robotic Syst..

[12]  Klaus Dorer,et al.  Trunk Controlled Motion Framework , 2013 .

[13]  Yuan Xu,et al.  SimSpark: An Open Source Robot Simulator Developed by the RoboCup Community , 2013, RoboCup.

[14]  Hartmut Geyer,et al.  Regulating speed and generating large speed transitions in a neuromuscular human walking model , 2012, 2012 IEEE International Conference on Robotics and Automation.

[15]  Shuuji Kajita,et al.  An Analytical Method for Real-Time Gait Planning for Humanoid Robots , 2006, Int. J. Humanoid Robotics.

[16]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[17]  David J. Fleet,et al.  Optimizing walking controllers , 2009, SIGGRAPH 2009.

[18]  Nima Shafii,et al.  An optimized gait generator based on fourier series towards fast and robust biped locomotion involving arms swing , 2009, 2009 IEEE International Conference on Automation and Logistics.

[19]  Kazuhito Yokoi,et al.  The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[20]  Luís Paulo Reis,et al.  Omnidirectional Walking and Active Balance for Soccer Humanoid Robot , 2013, EPIA.

[21]  Daniel Urieli,et al.  Design and Optimization of an Omnidirectional Humanoid Walk: A Winning Approach at the RoboCup 2011 3D Simulation Competition , 2012, AAAI.

[22]  Michail G. Lagoudakis,et al.  Complete analytical inverse kinematics for NAO , 2013, 2013 13th International Conference on Autonomous Robot Systems.

[23]  Nikolaos G. Tsagarakis,et al.  Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Michail G. Lagoudakis,et al.  Complete Analytical Forward and Inverse Kinematics for the NAO Humanoid Robot , 2015, J. Intell. Robotic Syst..

[25]  Kazuhito Yokoi,et al.  Biped walking pattern generation by using preview control of zero-moment point , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[26]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .