Learning locomotion over rough terrain using terrain templates

We address the problem of foothold selection in robotic legged locomotion over very rough terrain. The difficulty of the problem we address here is comparable to that of human rock-climbing, where foot/hand-hold selection is one of the most critical aspects. Previous work in this domain typically involves defining a reward function over footholds as a weighted linear combination of terrain features. However, a significant amount of effort needs to be spent in designing these features in order to model more complex decision functions, and hand-tuning their weights is not a trivial task. We propose the use of terrain templates, which are discretized height maps of the terrain under a foothold on different length scales, as an alternative to manually designed features. We describe an algorithm that can simultaneously learn a small set of templates and a foothold ranking function using these templates, from expert-demonstrated footholds. Using the LittleDog quadruped robot, we experimentally show that the use of terrain templates can produce complex ranking functions with higher performance than standard terrain features, and improved generalization to unseen terrain.

[1]  O. J. M. Smith,et al.  A controller to overcome dead time , 1959 .

[2]  H. Akaike Autoregressive model fitting for control , 1971 .

[3]  Robin De Keyser,et al.  A self-tuning multistep predictor application , 1981, Autom..

[4]  Patrizio Tomei,et al.  Model reference adaptive control algorithms for industrial robots , 1984, Autom..

[5]  John J. Craig,et al.  Introduction to Robotics Mechanics and Control , 1986 .

[6]  Christopher G. Atkeson,et al.  Estimation of Inertial Parameters of Manipulator Loads and Links , 1986 .

[7]  K. Narendra,et al.  Persistent excitation in adaptive systems , 1987 .

[8]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[9]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[10]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[11]  Mitsuo Kawato,et al.  Feedback-error-learning neural network for trajectory control of a robotic manipulator , 1988, Neural Networks.

[12]  Edoardo Mosca,et al.  Robustness of multipredictor adaptive regulators: MUSMAR , 1988, Autom..

[13]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[14]  Mark W. Spong,et al.  Robot dynamics and control , 1989 .

[15]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[16]  C. von der Malsburg,et al.  Distortion invariant object recognition by matching hierarchically labeled graphs , 1989, International 1989 Joint Conference on Neural Networks.

[17]  W. Thomas Miller,et al.  Real-time application of neural networks for sensor-based control of robots with vision , 1989, IEEE Trans. Syst. Man Cybern..

[18]  Sheng Chen,et al.  Identification of MIMO non-linear systems using a forward-regression orthogonal estimator , 1989 .

[19]  Geoffrey E. Hinton,et al.  Evaluation of Adaptive Mixtures of Competing Experts , 1990, NIPS.

[20]  Mitsuo Kawato,et al.  Feedback-Error-Learning Neural Network for Supervised Motor Learning , 1990 .

[21]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[22]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[23]  Andrew W. Moore,et al.  Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.

[24]  Kenneth Kreutz-Delgado,et al.  Learning Global Direct Inverse Kinematics , 1991, NIPS.

[25]  João Miranda Lemos,et al.  A Long-Range Adaptive Controller for Robot Manipulators , 1991, Int. J. Robotics Res..

[26]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[27]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[28]  M. Gautier,et al.  Exciting Trajectories for the Identification of Base Inertial Parameters of Robots , 1992 .

[29]  Mitsuo Kawato,et al.  Recognition of manipulated objects by motor learning with modular architecture networks , 1991, Neural Networks.

[30]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[31]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[32]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[33]  Bruno Siciliano,et al.  Modeling and Control of Robot Manipulators , 1995 .

[34]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[35]  Jianqing Fan,et al.  Data‐Driven Bandwidth Selection in Local Polynomial Fitting: Variable Bandwidth and Spatial Adaptation , 1995 .

[36]  Kevin M. Passino,et al.  Fuzzy Model Reference Learning Control , 1996, J. Intell. Fuzzy Syst..

[37]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[38]  Yoram Singer,et al.  Learning to Order Things , 1997, NIPS.

[39]  Etienne Burdet,et al.  Experiments in nonlinear adaptive control , 1997, Proceedings of International Conference on Robotics and Automation.

[40]  Bernhard Schölkopf,et al.  Prior Knowledge in Support Vector Kernels , 1997, NIPS.

[41]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[42]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[43]  Kumpati S. Narendra,et al.  Adaptive control using multiple models , 1997, IEEE Trans. Autom. Control..

[44]  Reza Shadmehr,et al.  Evidence for a Forward Dynamics Model in Human Adaptive Motor Control , 1998, NIPS.

[45]  D. Wolpert,et al.  Internal models in the cerebellum , 1998, Trends in Cognitive Sciences.

[46]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[47]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[48]  David W. Clarke,et al.  Successive one-step-ahead predictions in multiple model predictive control , 1998, Int. J. Syst. Sci..

[49]  Alessandro De Luca,et al.  A general algorithm for dynamic feedback linearization of robots with elastic joints , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[50]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[51]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[52]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[53]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[54]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[55]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[56]  Stefan Schaal,et al.  Locally Weighted Projection Regression : An O(n) Algorithm for Incremental Real Time Learning in High Dimensional Space , 2000 .

[57]  Volker Tresp,et al.  A Bayesian Committee Machine , 2000, Neural Computation.

[58]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[59]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[60]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[61]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[62]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[63]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[64]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[65]  Eric A. Wan,et al.  Model predictive neural control with applications to a 6 DOF helicopter model , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[66]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[67]  Ben J. A. Kröse,et al.  A probabilistic model for appearance-based robot localization , 2001, Image and Vision Computing.

[68]  Stefan Schaal,et al.  Biomimetic gaze stabilization based on feedback-error-learning with nonparametric regression networks , 2001, Neural Networks.

[69]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[70]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[71]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[72]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[73]  Ricardo O. Carelli,et al.  Neural networks for advanced control of robot manipulators , 2002, IEEE Trans. Neural Networks.

[74]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[75]  Bernhard Schölkopf,et al.  A Short Introduction to Learning with Kernels , 2002, Machine Learning Summer School.

[76]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[77]  Shie Mannor,et al.  Sparse Online Greedy Support Vector Regression , 2002, ECML.

[78]  Gert Cauwenberghs,et al.  Silicon Support Vector Machine with On-Line Learning , 2003, Int. J. Pattern Recognit. Artif. Intell..

[79]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[80]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[81]  Jun Morimoto,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[82]  James Theiler,et al.  Accurate On-line Support Vector Regression , 2003, Neural Computation.

[83]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[84]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[85]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[86]  Jun Nakanishi,et al.  Feedback error learning and nonlinear adaptive control , 2004, Neural Networks.

[87]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[88]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[89]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[90]  James V. Stone,et al.  Recurrent cerebellar architecture solves the motor-error problem , 2004, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[91]  J.J. Steil,et al.  Backpropagation-decorrelation: online recurrent learning with O(N) complexity , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[92]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[93]  J. Kocijan,et al.  Gaussian process model based predictive control , 2004, Proceedings of the 2004 American Control Conference.

[94]  Peter K. Allen,et al.  An SVM learning approach to robotic grasping , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[95]  W. Härdle Nonparametric and Semiparametric Models , 2004 .

[96]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[97]  Jun Nakanishi,et al.  Composite adaptive control with locally weighted statistical learning , 2005, Neural Networks.

[98]  Oussama Khatib,et al.  Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives , 2005, Int. J. Humanoid Robotics.

[99]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[100]  Wolfgang Maass,et al.  Movement Generation with Circuits of Spiking Neurons , 2005, Neural Computation.

[101]  Marc Toussaint,et al.  Learning discontinuities with products-of-sigmoids for switching between local models , 2005, ICML.

[102]  Bart De Schutter,et al.  Learning-based model predictive control for Markov decision processes , 2005 .

[103]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[104]  Wolfram Burgard,et al.  Heteroscedastic Gaussian Process Regression for Modeling Range Sensors in Mobile Robotics , 2005 .

[105]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[106]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[107]  Hannu T. Toivonen,et al.  A neural network model predictive controller , 2006 .

[108]  Jun Nakanishi,et al.  A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics , 2006, Robotics: Science and Systems.

[109]  Zhiyong Yang,et al.  Neural-Network Inverse Dynamic Online Learning Control on Physical Exoskeleton , 2006, ICONIP.

[110]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[111]  Helge J. Ritter,et al.  Dynamic Path Planning for a 7-DOF Robot Arm , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[112]  Pietro Perona,et al.  Slip Prediction Using Visual Information , 2006, Robotics: Science and Systems.

[113]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[114]  Marc Toussaint,et al.  Learning Multiple Models of Non-linear Dynamics for Control Under Varying Contexts , 2006, ICANN.

[115]  Nicolas Schweighofer,et al.  Local Online Support Vector Regression for Learning Control , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[116]  Stefan Schaal,et al.  A Robust Quadruped Walking Gait for Traversing Rough Terrain , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[117]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[118]  Joachim Hoffmann,et al.  Exploiting redundancy for flexible behavior: unsupervised learning in a modular sensorimotor control architecture. , 2007, Psychological review.

[119]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[120]  Stefan Schaal,et al.  Kernel Carpentry for Online Regression Using Randomly Varying Coefficient Model , 2007, IJCAI.

[121]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[122]  J.P. Ferreira,et al.  Simulation control of a biped robot with Support Vector Regression , 2007, 2007 IEEE International Symposium on Intelligent Signal Processing.

[123]  Manuel Lopes,et al.  A learning framework for generic sensory-motor maps , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[124]  Benjamin Schrauwen,et al.  An overview of reservoir computing: theory, applications and implementations , 2007, ESANN.

[125]  Joaquin Quiñonero-Candela,et al.  Large-Scale Kernel Machines , 2007 .

[126]  Jochen J. Steil,et al.  Online reservoir adaptation by intrinsic plasticity for backpropagation-decorrelation and echo state learning , 2007, Neural Networks.

[127]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[128]  Jerry E. Pratt,et al.  A Controller for the LittleDog Quadruped Walking on Rough Terrain , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[129]  Naftali Tishby,et al.  Incorporating Prior Knowledge on Features into Learning , 2007, AISTATS.

[130]  M. Opper Sparse Online Gaussian Processes , 2008 .

[131]  Wolfram Burgard,et al.  Learning predictive terrain models for legged robot locomotion , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[132]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[133]  Andrew Y. Ng,et al.  A control architecture for quadruped locomotion over rough terrain , 2008, 2008 IEEE International Conference on Robotics and Automation.

[134]  R.F. Reinhart,et al.  Recurrent Neural Associative Learning of Forward and Inverse Kinematics for Movement Generation of the Redundant PA-10 Robot , 2008, 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems (LAB-RS).

[135]  Daniel H. Grollman,et al.  Sparse incremental learning for interactive robot control policy estimation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[136]  Stefan Schaal,et al.  Bayesian Kernel Shaping for Learning Control , 2008, NIPS.

[137]  Jun Nakanishi,et al.  A Unifying Methodology for Robot Control with Redundant DOFs , 2008 .

[138]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[139]  Trevor Darrell,et al.  Sparse probabilistic regression for activity-independent human pose inference , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[140]  Jochen J. Steil,et al.  Attractor-based computation with reservoirs for online learning of inverse kinematics , 2009, ESANN.

[141]  Olivier Sigaud,et al.  Control of redundant robots using learned models: An operational space control approach , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[142]  Wolfram Burgard,et al.  Adaptive autonomous control using online value iteration with gaussian processes , 2009, 2009 IEEE International Conference on Robotics and Automation.

[143]  Stefan Ulbrich,et al.  Rapid learning of humanoid body schemas with Kinematic Bézier Maps , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[144]  Jochen J. Steil,et al.  Efficient exploration and learning of whole body kinematics , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[145]  Robert N. K. Loh,et al.  Model Reference Adaptive Control for Actuators of a Biped Robot Locomotion , 2009 .

[146]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[147]  Jochen J. Steil,et al.  Reaching movement generation with a recurrent neural network based on learning inverse kinematics for the humanoid robot iCub , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[148]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[149]  Stefan Schaal,et al.  Compliant quadruped locomotion over rough terrain , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[150]  Jan Peters,et al.  Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[151]  S. Vijayakumar,et al.  Realising Dextrous Manipulation with Structured Manifolds using Unsupervised Kernel Regression with Structural Hints , 2009 .

[152]  Stefan Schaal,et al.  Local Dimensionality Reduction for Non-Parametric Regression , 2009, Neural Processing Letters.

[153]  Oliver Kroemer,et al.  Active learning using mean shift optimization for robot grasping , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[154]  Jan Peters,et al.  Incremental Sparsification for Real-time Online Model Learning , 2010, AISTATS.

[155]  Bernt Schiele,et al.  Multi-modal Learning , 2010, Cognitive Systems.

[156]  Marek Sewer Kopicki,et al.  Prediction learning in robotic manipulation , 2010 .

[157]  Martin V. Butz,et al.  The SURE_REACH Model for Motor Learning and Control of a Redundant Arm: From Modeling Human Behavior to Applications in Robotics , 2010, From Motor Learning to Interaction Learning in Robots.

[158]  Manuel Lopes,et al.  Body schema acquisition through active learning , 2010, 2010 IEEE International Conference on Robotics and Automation.

[159]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[160]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[161]  Alejandro Hernández Arieta,et al.  Body Schema in Robotics: A Review , 2010, IEEE Transactions on Autonomous Mental Development.

[162]  Jochen J. Steil,et al.  Goal Babbling Permits Direct Learning of Inverse Kinematics , 2010, IEEE Transactions on Autonomous Mental Development.

[163]  Stefan Schaal,et al.  Learning Control in Reobotics: Trajectory-Based Optimal Control Techniques , 2010 .

[164]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[165]  Jan Peters,et al.  Incremental online sparsification for model learning in real-time robot control , 2011, Neurocomputing.

[166]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2011, Int. J. Robotics Res..