Bayesian Domain Randomization for Sim-to-Real Transfer

When learning policies for robot control, the real-world data required is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) according to a distribution over domain parameters during training in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. Thus, we propose Bayesian Domain Randomization (BayRn), a black box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning by sampling the real-world target domain. BayRn utilizes Bayesian optimization to search the space of source domain distribution parameters which produce a policy that maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach by comparing against two baseline methods on a nonlinear under-actuated swing-up task. Our results show that BayRn is capable to perform direct sim-to-real transfer, while significantly reducing the required prior knowledge.

[1]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[2]  Benjamin F. Hobbs,et al.  Is optimization optimistically biased , 1989 .

[3]  Manmohan Krishna Chandraker,et al.  Learning To Simulate , 2018, ICLR.

[4]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[6]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[7]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Andrew Gordon Wilson,et al.  BoTorch: Programmable Bayesian Optimization in PyTorch , 2019, ArXiv.

[10]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[11]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[12]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[13]  Shimon Whiteson,et al.  Fingerprint Policy Optimisation for Robust Reinforcement Learning , 2018, ICML.

[14]  Jan Peters,et al.  Assessing Transferability From Simulation to Reality for Reinforcement Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[16]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[18]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[19]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[20]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[24]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[25]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[26]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.