Offline Meta-Reinforcement Learning for Industrial Insertion

Reinforcement learning (RL) can in principle make it possible for robots to automatically adapt to new tasks, but in practice current RL methods require a very large number of trials to accomplish this. In this paper, we tackle rapid adaptation to new tasks through the framework of metalearning, which utilizes past tasks to learn to adapt, with a specific focus on industrial insertion tasks. We address two specific challenges by applying meta-learning in this setting. First, conventional meta-RL algorithms require lengthy online meta-training phases. We show that this can be replaced with appropriately chosen offline data, resulting in an offline metaRL method that only requires demonstrations and trials from each of the prior tasks, without the need to run costly metaRL procedures online. Second, meta-RL methods can fail to generalize to new tasks that are too different from those seen at meta-training time, which poses a particular challenge in industrial applications, where high success rates are critical. We address this by combining contextual meta-learning with direct online finetuning: if the new task is similar to those seen in the prior data, then the contextual meta-learner adapts immediately, and if it is too different, it gradually adapts through finetuning. We show that our approach is able to quickly adapt to a variety of different insertion tasks, learning how to perform them with a success rate of 100% using only a fraction of the samples needed for learning the tasks from scratch. Experiment videos and details are available at https://sites.google. com/view/offline-metarl-insertion.

[1]  Daniel E. Whitney,et al.  Force Feedback Control of Manipulator Fine Motions , 1977 .

[2]  Daniel E. Whitney,et al.  Quasi-Static Assembly of Compliantly Supported Rigid Parts , 1982 .

[3]  Joris De Schutter,et al.  Peg-on-hole: a model based solution to peg and hole alignment , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[4]  J. Knight,et al.  Robotic assembly: chamferless peg-hole assembly , 1999, Robotica.

[5]  Wyatt S. Newman,et al.  Interpretation of force and moment signals for compliant peg-in-hole assembly , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[6]  Michael S. Branicky,et al.  Search strategies for peg-in-hole assemblies with position uncertainty , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[7]  Moonhong Baeg,et al.  Intuitive peg-in-hole assembly strategy with a compliant manipulator , 2013, IEEE ISR 2013.

[8]  Prabir K. Pal,et al.  Intelligent and environment-independent Peg-In-Hole search strategies , 2013, 2013 International Conference on Control, Automation, Robotics and Embedded Systems (CARE).

[9]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Yu Zhao,et al.  Autonomous alignment of peg and hole by force/torque measurement for robotic assembly , 2016, 2016 IEEE International Conference on Automation Science and Engineering (CASE).

[12]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[13]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[14]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[15]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[16]  Giovanni De Magistris,et al.  Deep reinforcement learning for high precision assembly tasks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[18]  Sergey Levine,et al.  Learning from the hindsight plan — Episodic MPC improvement , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Pieter Abbeel,et al.  Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Ben Wang,et al.  The Future of Manufacturing: A New Perspective , 2018, Engineering.

[21]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[22]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[23]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[24]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[26]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[27]  Alice M. Agogino,et al.  Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[29]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[30]  Ali Farhadi,et al.  Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Oleg O. Sushkov,et al.  A Practical Approach to Insertion with Variable Socket Position Using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[32]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[34]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[35]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[36]  D. Fox,et al.  IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Sergey Levine,et al.  Meta-Reinforcement Learning for Robotic Industrial Insertion Tasks , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Aviv Tamar,et al.  Offline Meta Learning of Exploration , 2020 .

[39]  Sergey Levine,et al.  Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Aviv Tamar,et al.  Offline Meta Reinforcement Learning , 2020, ArXiv.

[41]  Sergey Levine,et al.  MELD: Meta-Reinforcement Learning from Images via Latent State Models , 2020, CoRL.

[42]  Sergey Levine,et al.  DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.

[43]  Sergey Levine,et al.  COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning , 2020, ArXiv.

[44]  Sergey Levine,et al.  Offline Meta-Reinforcement Learning with Advantage Weighting , 2020, ICML.

[45]  Oleg O. Sushkov,et al.  Robust Multi-Modal Policies for Industrial Assembly via Reinforcement Learning and Demonstrations: A Large-Scale Study , 2021, Robotics: Science and Systems.

[46]  Dotan Di Castro,et al.  InsertionNet - A Scalable Solution for Insertion , 2021, IEEE Robotics and Automation Letters.

[47]  Scott Fujimoto,et al.  A Minimalist Approach to Offline Reinforcement Learning , 2021, NeurIPS.

[48]  Offline Meta-Reinforcement Learning with Online Self-Supervision , 2021, ArXiv.

[49]  Dijun Luo,et al.  Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization , 2020, ArXiv.