Self-Paced Domain Randomization

Deep Reinforcement Learning (DRL) has seen an uptick in publications due to its impressive performance in a variety of tasks. However, this comes at a cost since using a deep neural network policy requires a huge amount of data to learn from. Acquiring this data on a physical device is time and resource expensive. Thus, DRL often relies on simulations, since they provide vast amount of diverse training data faster than real time. A major problem in this research area is the reality gap, which describes the differences between the simulated and the real world, making the policy transfer from the virtual environment to a real robot brittle and difficult. In this paper we propose a novel application of curriculum learning to the area of domain randomization called Self-Paced Domain Randomization (SPDR), which puts the Reinforcement Learning (RL) policy “in the loop”. By letting the policy influence the automatic generation of the curriculum of domain parameters based on its current performance we can show that this leads to performance increases and more stable policies when using DRL methods, both in the simulated environments and when applied to real-world platforms.

[1]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[2]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[3]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[4]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[5]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[6]  Shie Mannor,et al.  Contextual Markov Decision Processes , 2015, ArXiv.

[7]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[8]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[10]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Pierre-Yves Oudeyer,et al.  Sim-to-Real Transfer with Neural-Augmented Robot Simulation , 2018, CoRL.

[12]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[13]  Sanjay Mehrotra,et al.  Distributionally Robust Optimization: A Review , 2019, ArXiv.

[14]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[15]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[16]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Shie Mannor,et al.  Distributional Robustness and Regularization in Reinforcement Learning , 2020, ArXiv.

[18]  Jorge Pena Queralta,et al.  Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

[19]  Jan Peters,et al.  A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning , 2021, J. Mach. Learn. Res..

[20]  Jan Peters,et al.  Data-Efficient Domain Randomization With Bayesian Optimization , 2020, IEEE Robotics and Automation Letters.