Learning grounded finite-state representations from unstructured demonstrations

Robots exhibit flexible behavior largely in proportion to their degree of knowledge about the world. Such knowledge is often meticulously hand-coded for a narrow class of tasks, limiting the scope of possible robot competencies. Thus, the primary limiting factor of robot capabilities is often not the physical attributes of the robot, but the limited time and skill of expert programmers. One way to deal with the vast number of situations and environments that robots face outside the laboratory is to provide users with simple methods for programming robots that do not require the skill of an expert. For this reason, learning from demonstration (LfD) has become a popular alternative to traditional robot programming methods, aiming to provide a natural mechanism for quickly teaching robots. By simply showing a robot how to perform a task, users can easily demonstrate new tasks as needed, without any special knowledge about the robot. Unfortunately, LfD often yields little knowledge about the world, and thus lacks robust generalization capabilities, especially for complex, multi-step tasks. We present a series of algorithms that draw from recent advances in Bayesian non-parametric statistics and control theory to automatically detect and leverage repeated structure at multiple levels of abstraction in demonstration data. The discovery of repeated structure provides critical insights into task invariants, features of importance, high-level task structure, and appropriate skills for the task. This culminates in the discovery of a finite-state representation of the task, composed of grounded skills that are flexible and reusable, providing robust generalization and transfer in complex, multi-step robotic tasks. These algorithms are tested and evaluated using a PR2 mobile manipulator, showing success on several complex real-world tasks, such as furniture assembly.

[1]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[2]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[3]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[4]  Roderic A. Grupen,et al.  Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.

[5]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[7]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[8]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[9]  Monica N. Nicolescu,et al.  Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[10]  Pradeep K. Khosla,et al.  Trajectory representation using sequenced linear dynamical systems , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Maja J. Mataric,et al.  Performance-Derived Behavior Vocabularies: Data-Driven Acquisition of Skills from Motion , 2004, Int. J. Humanoid Robotics.

[12]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[13]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[14]  Maja J. Mataric,et al.  A spatio-temporal extension to Isomap nonlinear dimension reduction , 2004, ICML.

[15]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[16]  Danica Kragic,et al.  Learning Task Models from Multiple Human Demonstrations , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[17]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[18]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[19]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[20]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Manuela M. Veloso,et al.  Confidence-based policy learning from demonstration using Gaussian mixture models , 2007, AAMAS '07.

[22]  Aude Billard,et al.  Handbook of Robotics Chapter 59 : Robot Programming by Demonstration , 2007 .

[23]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[24]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[25]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[26]  Michael I. Jordan,et al.  An HDP-HMM for systems with state persistence , 2008, ICML '08.

[27]  Danica Kragic,et al.  Robot Learning from Demonstration: A Task-level Planning Approach , 2008 .

[28]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[29]  Daniel H. Grollman,et al.  Sparse incremental learning for interactive robot control policy estimation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[30]  Pieter Abbeel,et al.  Learning for control from multiple demonstrations , 2008, ICML '08.

[31]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[32]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[33]  Michael I. Jordan,et al.  Sharing Features among Dynamical Systems with Beta Processes , 2009, NIPS.

[34]  Dana Kulic,et al.  Online Segmentation and Clustering From Continuous Observation of Whole Body Motions , 2009, IEEE Transactions on Robotics.

[35]  Sethu Vijayakumar,et al.  Latent spaces for dynamic movement primitives , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[36]  Jochen J. Steil,et al.  Imitating object movement skills with robots — A task-level approach exploiting generalization and invariance , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Marc Toussaint,et al.  Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference , 2010, 2010 IEEE International Conference on Robotics and Automation.

[38]  Pierre-Yves Oudeyer,et al.  Incremental local online Gaussian Mixture Regression for imitation learning of multiple tasks , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[40]  Daniel H. Grollman,et al.  Incremental learning of subtasks from unsegmented demonstration , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Berthold Bäuml,et al.  Kinematically optimal catching a flying ball with a hand-arm-system , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Odest Chadwicke Jenkins,et al.  Learning from demonstration using a multi-valued function regressor for time-series data , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[43]  Pieter Abbeel,et al.  Parameterized maneuver learning for autonomous helicopter flight , 2010, 2010 IEEE International Conference on Robotics and Automation.

[44]  Jan Peters,et al.  Movement extraction by detecting dynamics switches and repetitions , 2010, NIPS.

[45]  Brian Williams,et al.  Motion learning in variable environments using probabilistic flow tubes , 2011, 2011 IEEE International Conference on Robotics and Automation.

[46]  Michael I. Jordan,et al.  Joint Modeling of Multiple Related Time Series via the Beta Process , 2011, 1111.4226.

[47]  Jennifer Barry,et al.  Bakebot: Baking Cookies with the PR2 , 2011 .

[48]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[49]  Stefan Schaal,et al.  Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[50]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[51]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[52]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[53]  Danica Kragic,et al.  Visual object-action recognition: Inferring object affordances from human demonstration , 2011, Comput. Vis. Image Underst..

[54]  Jochen J. Steil,et al.  Interactive imitation learning of object movement skills , 2011, Autonomous Robots.

[55]  Daniel H. Grollman,et al.  Remote Robotic Laboratories for Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[56]  Trevor Darrell,et al.  A geometric approach to robotic laundry folding , 2012, Int. J. Robotics Res..

[57]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[58]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[59]  Jonathan P. How,et al.  Bayesian Nonparametric Inverse Reinforcement Learning , 2012, ECML/PKDD.

[60]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[61]  Stefanos Nikolaidis,et al.  Optimization of Temporal Dynamics for Adaptive Human-Robot Interaction in Assembly Manufacturing , 2012, Robotics: Science and Systems.

[62]  Maya Cakmak,et al.  Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[63]  T. Martin McGinnity,et al.  Automatically composing and parameterizing skills by evolving Finite State Automata , 2012, Robotics Auton. Syst..

[64]  Carme Torras,et al.  Learning Collaborative Impedance-Based Robot Behaviors , 2013, AAAI.

[65]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.