论文信息 - Robot learning from demonstration by constructing skill trees

Robot learning from demonstration by constructing skill trees

We describe CST, an online algorithm for constructing skill trees from demonstration trajectories. CST segments a demonstration trajectory into a chain of component skills, where each skill has a goal and is assigned a suitable abstraction from an abstraction library. These properties permit skills to be improved efficiently using a policy learning algorithm. Chains from multiple demonstration trajectories are merged into a skill tree. We show that CST can be used to acquire skills from human demonstration in a dynamic continuous domain, and from both expert demonstration and learned control sequences on the uBot-5 mobile manipulator.

[1] Russell H. Taylor,et al. Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[2] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[4] Daniel E. Koditschek,et al. Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[5] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[8] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[9] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[10] Pradeep K. Khosla,et al. Trajectory representation using sequenced linear dynamical systems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11] Maja J. Mataric,et al. Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[12] Pradeep K. Khosla,et al. Learning by observation with mobile robots: a computational approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[13] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[15] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[16] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[18] Roderic A. Grupen,et al. Designing a Self-Stabilizing Robot for Dynamic Mobile Manipulation , 2006 .

[19] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[20] Manuela M. Veloso,et al. Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[21] P. Fearnhead,et al. On‐line inference for multiple changepoint problems , 2007 .

[22] Kevin P. Murphy,et al. Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[23] Thomas G. Dietterich,et al. Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[24] Jan Peters,et al. Using Bayesian Dynamical Systems for Motion Template Libraries , 2008, NIPS.

[25] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[26] Michael I. Jordan,et al. Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[27] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[28] Peng Zhou,et al. Discovering options from example trajectories , 2009, ICML '09.

[29] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[30] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[31] Jan Peters,et al. Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[32] Russ Tedrake,et al. LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[33] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[34] Dana Kulic,et al. Online Segmentation and Clustering From Continuous Observation of Whole Body Motions , 2009, IEEE Transactions on Robotics.

[35] Scott Kuindersma,et al. Dexterous mobility with the uBot-5 mobile manipulator , 2009, 2009 International Conference on Advanced Robotics.

[36] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.

[37] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[38] Scott Kuindersma,et al. Learning from a Single Demonstration: Motion Planning with Skill Segmentation , 2010 .

[39] Daniel H. Grollman,et al. Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40] Odest Chadwicke Jenkins,et al. Learning from demonstration using a multi-valued function regressor for time-series data , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[41] Danica Kragic,et al. Learning Actions from Observations , 2010, IEEE Robotics & Automation Magazine.

[42] Jan Peters,et al. Movement extraction by detecting dynamics switches and repetitions , 2010, NIPS.

[43] Scott Kuindersma,et al. Autonomous Skill Acquisition on a Mobile Manipulator , 2011, AAAI.

[44] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.