论文信息 - Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories

Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories

We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a change-point detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction.

[1] Russell H. Taylor,et al. Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[2] S. Boccaletti,et al. Adaptive targeting of chaos , 1997 .

[3] Roderic A. Grupen,et al. A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..

[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[5] Daniel E. Koditschek,et al. Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[6] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[9] Maja J. Mataric,et al. Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[10] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[13] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[14] Manuela M. Veloso,et al. Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[15] P. Fearnhead,et al. On‐line inference for multiple changepoint problems , 2007 .

[16] Kevin P. Murphy,et al. Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[17] Thomas G. Dietterich,et al. Automatic discovery and transfer of MAXQ hierarchies , 2008, ICML '08.

[18] Michael I. Jordan,et al. Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[19] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[20] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[21] Jan Peters,et al. Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[22] Russ Tedrake,et al. LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[23] Autonomously Learning an Action Hierarchy Using a Learned Qualitative State Representation , 2009, IJCAI.

[24] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[25] Daniel H. Grollman,et al. Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26] Odest Chadwicke Jenkins,et al. Learning from demonstration using a multi-valued function regressor for time-series data , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[27] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[28] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .