Learning to search: Functional gradient techniques for imitation learning

Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling “programming by demonstration” for developing high-performance robotic systems. Unfortunately, many “behavioral cloning” (Bain and Sammut in Machine intelligence agents. London: Oxford University Press, 1995; Pomerleau in Advances in neural information processing systems 1, 1989; LeCun et al. in Advances in neural information processing systems 18, 2006) approaches that utilize classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. These systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance.While planning algorithms have shown success in many real-world applications ranging from legged locomotion (Chestnutt et al. in Proceedings of the IEEE-RAS international conference on humanoid robots, 2003) to outdoor unstructured navigation (Kelly et al. in Proceedings of the international symposium on experimental robotics (ISER), 2004; Stentz et al. in AUVSI’s unmanned systems, 2007), such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, a set of techniques has been developed that explore learning these functions from expert human demonstration. These algorithms apply an inverse optimal control approach to find a cost function for which planned behavior mimics an expert’s demonstration.The work we present extends the Maximum Margin Planning (MMP) (Ratliff et al. in Twenty second international conference on machine learning (ICML06), 2006a) framework to admit learning of more powerful, non-linear cost functions. These algorithms, known collectively as LEARCH (LEArning to seaRCH), are simpler to implement than most existing methods, more efficient than previous attempts at non-linearization (Ratliff et al. in NIPS, 2006b), more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the function’s form. We derive and discuss the framework both mathematically and intuitively, and demonstrate practical real-world performance with three applied case-studies including legged locomotion, grasp planning, and autonomous outdoor unstructured navigation. The latter study includes hundreds of kilometers of autonomous traversal through complex natural environments. These case-studies address key challenges in applying the algorithm in practical settings that utilize state-of-the-art planners, and which may be constrained by efficiency requirements and imperfect expert demonstration.

[1]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[2]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[3]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[5]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[6]  Vladimír Kucera Optimal control: Linear quadratic methods: Brian D. O. Anderson and John B. Moore , 1992, Autom..

[7]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Stefan Schaal,et al.  Memory-based robot learning , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[10]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[11]  Vladimir A. Yakubovich,et al.  Linear Matrix Inequalities in System and Control Theory (S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan) , 1995, SIAM Rev..

[12]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[13]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[14]  Geoffrey J. Gordon,et al.  Approximate solutions to markov decision processes , 1999 .

[15]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[16]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[17]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[18]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[19]  M. Drenth San Juan, Puerto Rico , 2001 .

[20]  D. Donoho,et al.  Maximal Sparsity Representation via l 1 Minimization , 2002 .

[21]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[22]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Henrik I. Christensen,et al.  Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[24]  J. Chestnutt,et al.  Planning Biped Navigation Strategies in Complex Environments , 2003 .

[25]  E. Jaynes Probability theory : the logic of science , 2003 .

[26]  Martial Hebert,et al.  Quality assessment of traversability maps from aerial LIDAR data for an unmanned ground vehicle , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[27]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[28]  Johan A. K. Suykens,et al.  Advances in learning theory : methods, models and applications , 2003 .

[29]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[30]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[31]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[32]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[33]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[34]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[35]  Takeo Kanade,et al.  Footstep Planning for the Honda ASIMO Humanoid , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[36]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[37]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[38]  Ben Taskar,et al.  Structured Prediction via the Extragradient Method , 2005, NIPS.

[39]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[40]  David Silver,et al.  Experimental Analysis of Overhead Data Processing To Support Long Range Navigation , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[42]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[43]  Anthony Stentz,et al.  Using interpolation to improve path planning: The Field D* algorithm , 2006, J. Field Robotics.

[44]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[45]  Anthony Stentz,et al.  The Crusher System for Autonomous Navigation , 2007 .

[46]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[47]  J. Andrew Bagnell,et al.  (Approximate) Subgradient Methods for Structured Prediction , 2007, International Conference on Artificial Intelligence and Statistics.

[48]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[49]  T. Poggio,et al.  Regularized Least-Squares Classification 133 In practice , although , 2007 .

[50]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[51]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[52]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[53]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[54]  Martial Hebert,et al.  Directional Associative Markov Network for 3-D Point Cloud Classification , 2008 .

[55]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[56]  採編典藏組 Society for Industrial and Applied Mathematics(SIAM) , 2008 .

[57]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[58]  Martial Hebert,et al.  Contextual classification with functional Max-Margin Markov Networks , 2009, CVPR.

[59]  Oliver Brock,et al.  High Performance Outdoor Navigation from Overhead Data using Imitation Learning , 2009 .

[60]  Nathan D. Ratliff Functional Bundle Methods , 2009 .