Robot Learning From Randomized Simulations: A Review

The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. It is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the ‘reality gap’. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named ‘domain randomization’ which is a method for learning from randomized simulations.

[1]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[2]  Lionel Ott,et al.  DISCO: Double Likelihood-free Inference Stochastic Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[4]  Jakob H. Macke,et al.  Flexible statistical inference for mechanistic models of neural dynamics , 2017, NIPS.

[5]  Chunyu Yang,et al.  Lyapunov Stability and Strong Passivity Analysis for Nonlinear Descriptor Systems , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[6]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Daniel Cremers,et al.  Fight Ill-Posedness with Ill-Posedness: Single-shot Variational Depth Super-Resolution from Shading , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[10]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[11]  Jan Peters,et al.  A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning , 2021, J. Mach. Learn. Res..

[12]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[13]  Shimon Whiteson,et al.  Fingerprint Policy Optimisation for Robust Reinforcement Learning , 2018, ICML.

[14]  Tobias Luksch,et al.  Adaptive movement sequences and predictive decisions based on hierarchical dynamical systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[16]  Rodney A. Brooks,et al.  Artificial Life and Real Robots , 1992 .

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Sonia Chernova,et al.  Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance? , 2019, IEEE Robotics and Automation Letters.

[19]  Pascal Frossard,et al.  Fundamental limits on adversarial robustness , 2015, ICML 2015.

[20]  Manuel Kaspar,et al.  Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[22]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[24]  Vikash Kumar,et al.  A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.

[25]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[26]  Paul Marjoram,et al.  Markov chain Monte Carlo without likelihoods , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Marc Hanheide,et al.  Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization , 2020, Robotics.

[28]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[29]  Huan Zhang,et al.  Robust Reinforcement Learning on State Observations with Learned Optimal Adversary , 2021, ICLR.

[30]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[31]  Jan Peters,et al.  Distributionally Robust Trajectory Optimization Under Uncertain Dynamics via Relative-Entropy Trust Regions , 2021, ArXiv.

[32]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[33]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[34]  Alan Fern,et al.  Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning , 2021, Robotics: Science and Systems.

[35]  Jan Peters,et al.  Learning inverse dynamics models with contacts , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Larry Jackel,et al.  The DARPA Robotics Challenge Finals: Results and Perspectives , 2017, J. Field Robotics.

[37]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[38]  W.D. Smart,et al.  What does shaping mean for computational reinforcement learning? , 2008, 2008 7th IEEE International Conference on Development and Learning.

[39]  Björn Wittenmark,et al.  Adaptive Dual Control Methods: An Overview , 1995 .

[40]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41]  David S. Greenberg,et al.  Automatic Posterior Transformation for Likelihood-Free Inference , 2019, ICML.

[42]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[43]  Pieter Abbeel,et al.  Learning Predictive Representations for Deformable Objects Using Contrastive Estimation , 2020, CoRL.

[44]  Jun Nakanishi,et al.  A Bayesian Approach to Nonlinear Parameter Identification for Rigid Body Dynamics , 2006, Robotics: Science and Systems.

[45]  M. Gutmann,et al.  Approximate Bayesian Computation , 2012 .

[46]  Jan Peters,et al.  Assessing Transferability From Simulation to Reality for Reinforcement Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[49]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[50]  Manmohan Krishna Chandraker,et al.  Learning To Simulate , 2018, ICLR.

[51]  Jürgen Leitner,et al.  Quantifying the Reality Gap in Robotic Manipulation Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[52]  Gilles Louppe,et al.  The frontier of simulation-based inference , 2020, Proceedings of the National Academy of Sciences.

[53]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[54]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[55]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[56]  Hod Lipson,et al.  Resilient Machines Through Continuous Self-Modeling , 2006, Science.

[57]  Bernhard Maschke,et al.  Dissipative Systems Analysis and Control , 2000 .

[58]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[59]  Jonas Degrave,et al.  A DIFFERENTIABLE PHYSICS ENGINE FOR DEEP LEARNING IN ROBOTICS , 2016, Front. Neurorobot..

[60]  Danica Kragic,et al.  Bayesian Optimization in Variational Latent Spaces with Dynamic Compression , 2019, CoRL.

[61]  H. Kahn,et al.  Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..

[62]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[63]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[64]  Jan Peters,et al.  Data-Efficient Domain Randomization With Bayesian Optimization , 2020, IEEE Robotics and Automation Letters.

[65]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[66]  Xingye Da,et al.  Dynamics Randomization Revisited: A Case Study for Quadrupedal Locomotion , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[67]  S. Amari Dynamics of pattern formation in lateral-inhibition type neural fields , 1977, Biological Cybernetics.

[68]  Nick Jakobi,et al.  Evolutionary Robotics and the Radical Envelope-of-Noise Hypothesis , 1997, Adapt. Behav..

[69]  Marian Körber,et al.  Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement Learning , 2021, ArXiv.

[70]  Jan Peters,et al.  Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment , 2018, CoRL.

[71]  Jiancheng Liu,et al.  ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[72]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[73]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[74]  Andrea L. Thomaz,et al.  TuneNet: One-Shot Residual Tuning for System Identification and Sim-to-Real Robot Task Transfer , 2019, CoRL.

[75]  Daniel Kuhn,et al.  Distributionally Robust Control of Constrained Stochastic Systems , 2016, IEEE Transactions on Automatic Control.

[76]  Yang Liu,et al.  Stein Variational Policy Gradient , 2017, UAI.

[77]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[78]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[79]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[80]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[81]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Stefan Schaal,et al.  Bayesian robot system identification with input and output noise , 2011, Neural Networks.

[83]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[84]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Gaurav S. Sukhatme,et al.  Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[86]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[87]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[88]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[89]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[90]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[91]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[92]  Daniel Kuhn,et al.  Mathematical Foundations of Robust and Distributionally Robust Optimization , 2021 .

[93]  Jan Peters,et al.  Self-Paced Contextual Reinforcement Learning , 2019, CoRL.

[94]  P. Atzberger The Monte-Carlo Method , 2006 .

[95]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[96]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[97]  Rogelio Lozano,et al.  Adaptive Control: Algorithms, Analysis and Applications , 2011 .

[98]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[99]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[100]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[101]  David Meger,et al.  Learning Domain Randomization Distributions for Training Robust Locomotion Policies , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[102]  Dieter Fox,et al.  A User's Guide to Calibrating Robotics Simulators , 2020, ArXiv.

[103]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[104]  Trevor Darrell,et al.  Auto-Tuned Sim-to-Real Transfer , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[105]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[106]  Austin Wang,et al.  Encoding Physical Constraints in Differentiable Newton-Euler Algorithm , 2020, L4DC.

[107]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[108]  Silvio Savarese,et al.  Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[109]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[110]  Byron Boots,et al.  Dual Online Stein Variational Inference for Control and Dynamics , 2021, Robotics: Science and Systems.

[111]  Gilles Louppe,et al.  Likelihood-free MCMC with Amortized Approximate Ratio Estimators , 2019, ICML.

[112]  R. D. Wright,et al.  On the Effectiveness of Common Random Numbers , 1979 .

[113]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[114]  Jonathan P. How,et al.  Autonomous drifting using simulation-aided reinforcement learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[115]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[116]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[117]  Brian Kingsbury,et al.  Knowledge distillation across ensembles of multilingual models for low-resource languages , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[118]  Kai Arulkumaran,et al.  Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation , 2019, ArXiv.

[119]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[120]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[121]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[122]  Christopher G. Atkeson,et al.  Estimation of Inertial Parameters of Manipulator Loads and Links , 1986 .

[123]  Daniel Cremers,et al.  Continuous Global Optimization in Multiview 3D Reconstruction , 2007, International Journal of Computer Vision.

[124]  Jean-Baptiste Mouret,et al.  Black-box data-efficient policy search for robotics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[125]  D. Dennett Why the Law of Effect will not Go Away , 1975 .

[126]  Jonathan P. How,et al.  Efficient reinforcement learning for robots using informative simulated priors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[127]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[128]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[129]  Vincent Padois,et al.  Tools for simulating humanoid robot dynamics: A survey based on user feedback , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[130]  Iain Murray,et al.  On Contrastive Learning for Likelihood-free Inference , 2020, ICML.

[131]  Lionel Ott,et al.  Learning to Plan Hierarchically From Curriculum , 2019, IEEE Robotics and Automation Letters.

[132]  Ruzena Bajcsy,et al.  Inferring the Material Properties of Granular Media for Robotic Tasks , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[133]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[134]  Yuval Tassa,et al.  Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[135]  Boris Belousov,et al.  Neural Posterior Domain Randomization , 2021, CoRL.

[136]  David P. Morton,et al.  Monte Carlo bounding techniques for determining solution quality in stochastic programs , 1999, Oper. Res. Lett..

[137]  Edward Johns,et al.  Benchmarking Domain Randomisation for Visual Sim-to-Real Transfer , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[138]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[139]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[140]  W. H. F. Barnes The Nature of Explanation , 1944, Nature.

[141]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[142]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, ACM Trans. Graph..

[143]  Jessica B. Hamrick,et al.  Simulation as an engine of physical scene understanding , 2013, Proceedings of the National Academy of Sciences.

[144]  Wojciech Zaremba,et al.  Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models , 2020, ArXiv.

[145]  Jan Peters,et al.  Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[146]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[147]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[148]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[149]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[150]  Kostas E. Bekris,et al.  Perspectives on Sim2Real Transfer for Robotics: A Summary of the R: SS 2020 Workshop , 2020, ArXiv.

[151]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[152]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[153]  Shie Mannor,et al.  Robust Value Iteration for Continuous Control Tasks , 2021, Robotics: Science and Systems.

[154]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[155]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[156]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[157]  Sergey Levine,et al.  SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[158]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[159]  Sylvain Calinon,et al.  A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials , 2018, IEEE Transactions on Robotics.

[160]  Marco Pavone,et al.  Lyapunov-stable neural-network control , 2021, Robotics: Science and Systems.

[161]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[162]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[163]  Jan Peters,et al.  Stochastic Control through Approximate Bayesian Input Inference , 2021, ArXiv.

[164]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[165]  David P. Morton,et al.  Assessing solution quality in stochastic programs , 2006, Algorithms for Optimization with Incomplete Information.

[166]  S. Amari,et al.  Existence and stability of local excitations in homogeneous neural fields , 1979, Journal of mathematical biology.

[167]  Tamer Basar,et al.  Dual Control Theory , 2001 .

[168]  Razvan Pascanu,et al.  Distilling Policy Distillation , 2019, AISTATS.

[169]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[170]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[171]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[172]  Jean-Baptiste Mouret,et al.  Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[173]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[174]  Dieter Fox,et al.  BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators , 2019, Robotics: Science and Systems.

[175]  Stéphane Doncieux,et al.  The Transferability Approach: Crossing the Reality Gap in Evolutionary Robotics , 2013, IEEE Transactions on Evolutionary Computation.

[176]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[177]  Gaurav S. Sukhatme,et al.  NeuralSim: Augmenting Differentiable Simulators with Neural Networks , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[178]  D. Fox,et al.  Online BayesSim for Combined Simulator Parameter Inference and Policy Improvement , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[179]  Albert N. Shiryaev,et al.  Optimal Stopping Rules , 2011, International Encyclopedia of Statistical Science.

[180]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[181]  B. Pasik-Duncan,et al.  Adaptive Control , 1996, IEEE Control Systems.

[182]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[183]  Nancy S. Pollard,et al.  Predictable behavior during contact simulation: a comparison of selected physics engines , 2016, Comput. Animat. Virtual Worlds.

[184]  Oliver Brock,et al.  Online interactive perception of articulated objects with multi-level recursive estimation based on task-specific priors , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.