Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for sample-efficient off-policy optimization of movement primitives for robot learning of high-dimensional manipulation skills. Our experimental results conducted both in simulation and on a real robot show that LAMPO provides sample-efficient policies against common approaches in literature.

[1]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[2]  Jan Peters,et al.  Using probabilistic movement primitives in robotics , 2018, Auton. Robots.

[3]  Christian R. Shelton,et al.  Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.

[4]  Carme Torras,et al.  Sample-Efficient Robot Motion Learning using Gaussian Process Latent Variable Models , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Carme Torras,et al.  Dimensionality Reduction for Dynamic Movement Primitives and Application to Bimanual Manipulation of Clothes , 2018, IEEE Transactions on Robotics.

[6]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[7]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[8]  François Charpillet,et al.  Prediction of Human Whole-Body Movements with AE- ProMPs , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[9]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[10]  Dana Kulic,et al.  Improving user specifications for robot behavior through active preference learning: Framework and evaluation , 2019, Int. J. Robotics Res..

[11]  Nolan Wagener,et al.  Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[12]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[13]  Carme Torras,et al.  Dimensionality reduction for probabilistic movement primitives , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[14]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[16]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[17]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[18]  Oliver Kroemer,et al.  A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms , 2019, J. Mach. Learn. Res..

[19]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[20]  Luís Paulo Reis,et al.  Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller , 2015, 2015 IEEE International Conference on Autonomous Robot Systems and Competitions.

[21]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[22]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Carme Torras,et al.  Dimensionality Reduction in Learning Gaussian Mixture Models of Movement Primitives for Contextualized Action Selection and Adaptation , 2018, IEEE Robotics and Automation Letters.

[24]  Pieter Abbeel,et al.  On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient , 2010, NIPS.

[25]  Jiashi Feng,et al.  Policy Optimization with Demonstrations , 2018, ICML.

[26]  Brijen Thananjeyan,et al.  On-Policy Robot Imitation Learning from a Converging Supervisor , 2019, CoRL.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Carme Torras,et al.  Dimensionality reduction and motion coordination in learning trajectories with Dynamic Movement Primitives , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Justin Bayer,et al.  Efficient movement representation by embedding Dynamic Movement Primitives in deep autoencoders , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[30]  Jan Peters,et al.  Extracting low-dimensional control variables for movement primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  Sethu Vijayakumar,et al.  Using dimensionality reduction to exploit constraints in reinforcement learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[35]  Maximilian Karl,et al.  Dynamic movement primitives in latent space of time-dependent variational autoencoders , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[36]  Jan Peters,et al.  Probabilistic Movement Primitives , 2013, NIPS.

[37]  Leonid Peshkin,et al.  Learning from Scarce Experience , 2002, ICML.