Robot learning from demonstration by constructing skill trees

We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved efficiently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot-5 mobile manipulator.

[1]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[2]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Daniel E. Koditschek,et al.  Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[8]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[9]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[10]  Pradeep K. Khosla,et al.  Trajectory representation using sequenced linear dynamical systems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[12]  Pradeep K. Khosla,et al.  Learning by observation with mobile robots: a computational approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[15]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[18]  Roderic A. Grupen,et al.  Designing a Self-Stabilizing Robot for Dynamic Mobile Manipulation , 2006 .

[19]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[20]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[21]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[22]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[23]  Thomas G. Dietterich,et al.  Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[24]  Jan Peters,et al.  Using Bayesian Dynamical Systems for Motion Template Libraries , 2008, NIPS.

[25]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[26]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[27]  Andrew G. Barto,et al.  Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[28]  Peng Zhou,et al.  Discovering options from example trajectories , 2009, ICML '09.

[29]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[30]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[31]  Jan Peters,et al.  Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[32]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[33]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[34]  Dana Kulic,et al.  Online Segmentation and Clustering From Continuous Observation of Whole Body Motions , 2009, IEEE Transactions on Robotics.

[35]  Scott Kuindersma,et al.  Dexterous mobility with the uBot-5 mobile manipulator , 2009, 2009 International Conference on Advanced Robotics.

[36]  Ronald Parr,et al.  Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.

[37]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[38]  Scott Kuindersma,et al.  Learning from a Single Demonstration: Motion Planning with Skill Segmentation , 2010 .

[39]  Daniel H. Grollman,et al.  Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Odest Chadwicke Jenkins,et al.  Learning from demonstration using a multi-valued function regressor for time-series data , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[41]  Danica Kragic,et al.  Learning Actions from Observations , 2010, IEEE Robotics & Automation Magazine.

[42]  Jan Peters,et al.  Movement extraction by detecting dynamics switches and repetitions , 2010, NIPS.

[43]  Scott Kuindersma,et al.  Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.

[44]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.