Metric-Based Imitation Learning Between Two Dissimilar Anthropomorphic Robotic Arms

The development of autonomous robotic systems that can learn from human demonstrations to imitate a desired behavior - rather than being manually programmed - has huge technological potential. One major challenge in imitation learning is the correspondence problem: how to establish corresponding states and actions between expert and learner, when the embodiments of the agents are different (morphology, dynamics, degrees of freedom, etc.). Many existing approaches in imitation learning circumvent the correspondence problem, for example, kinesthetic teaching or teleoperation, which are performed on the robot. In this work we explicitly address the correspondence problem by introducing a distance measure between dissimilar embodiments. This measure is then used as a loss function for static pose imitation and as a feedback signal within a model-free deep reinforcement learning framework for dynamic movement imitation between two anthropomorphic robotic arms in simulation. We find that the measure is well suited for describing the similarity between embodiments and for learning imitation policies by distance minimization.

[1]  Kerstin Dautenhahn,et al.  Achieving Corresponding Effects on Multiple Robotic Platforms: Imitating in Context Using Different Effect Metrics , 2005 .

[2]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[3]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[4]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[5]  Aude Billard,et al.  Learning from Humans , 2016, Springer Handbook of Robotics, 2nd Ed..

[6]  Manuel Lopes,et al.  Affordance-based imitation learning in robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[8]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[9]  Rajesh P. N. Rao,et al.  Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans , 2009, Creating Brain-Like Intelligence.

[10]  Jan Peters,et al.  Phase estimation for fast action recognition and trajectory generation in human–robot collaboration , 2017, Int. J. Robotics Res..

[11]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[12]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[13]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[14]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[15]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[16]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[17]  K. Dautenhahn,et al.  Do as I Do: Correspondences across Different Robotic Embodiments , 2002 .

[18]  Chrystopher L. Nehaniv,et al.  Imitation with ALICE: learning to imitate corresponding actions across dissimilar embodiments , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[19]  Peter Englert,et al.  Probabilistic model-based imitation learning , 2013, Adapt. Behav..

[20]  Tamim Asfour,et al.  Imitation Learning of Dual-Arm Manipulation Tasks in Humanoid Robots , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[21]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[22]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[24]  K. Dautenhahn,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2009 .

[25]  Stefano Ermon,et al.  Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[27]  Charu C. Aggarwal,et al.  Neural Networks and Deep Learning , 2018, Springer International Publishing.

[28]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[29]  Mamoru Mitsuishi,et al.  Trajectory planning under different initial conditions for surgical task automation by learning from demonstration , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[30]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[31]  Chrystopher L. Nehaniv,et al.  Like Me?- Measures of Correspondence and Imitation , 2001, Cybern. Syst..

[32]  Darwin G. Caldwell,et al.  Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input , 2011, Adv. Robotics.

[33]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[34]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[35]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[36]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[37]  Rajesh P. N. Rao,et al.  Learning to Walk by Imitation in Low-Dimensional Subspaces , 2010, Adv. Robotics.

[38]  Frank Chongwoo Park,et al.  A Lie Group Formulation of Robot Dynamics , 1995, Int. J. Robotics Res..