Learning to Fly via Deep Model-Based Reinforcement Learning

Learning to control robots without requiring engineered models has been a long-term goal, promising diverse and novel applications. Yet, reinforcement learning has only achieved limited impact on real-time robot control due to its high demand of real-world interactions. In this work, by leveraging a learnt probabilistic model of drone dynamics, we learn a thrust-attitude controller for a quadrotor through model-based reinforcement learning. No prior knowledge of the flight dynamics is assumed; instead, a sequential latent variable model, used generatively and as an online filter, is learnt from raw sensory input. The controller and value function are optimised entirely by propagating stochastic analytic gradients through generated latent trajectories. We show that "learning to fly" can be achieved with less than 30 minutes of experience with a single drone, and can be deployed solely using onboard computational resources and sensors, on a self-built drone.

[1]  Luxin Han,et al.  Optimal Trajectory Generation for Quadrotor Teach-and-Repeat , 2019, IEEE Robotics and Automation Letters.

[2]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[3]  Sergey Levine,et al.  Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.

[4]  Thomas J. Walsh,et al.  Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.

[5]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[6]  Dustin Tran,et al.  TensorFlow Distributions , 2017, ArXiv.

[7]  Ole Winther,et al.  A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning , 2017, NIPS.

[8]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[9]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[10]  Fei Gao,et al.  Flying through a narrow gap using neural network: an end-to-end planning and control approach , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Joshua B. Tenenbaum,et al.  End-to-End Differentiable Physics for Learning and Control , 2018, NeurIPS.

[12]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[13]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[14]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[15]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[16]  Shaojie Shen,et al.  An Efficient B-Spline-Based Kinodynamic Replanning Framework for Quadrotors , 2019, IEEE Transactions on Robotics.

[17]  Michael S. Ryoo,et al.  Learning Real-World Robot Policies by Dreaming , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[20]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[23]  Martin A. Riedmiller,et al.  Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models , 2019, CoRL.

[24]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  R. Bellman A Markovian Decision Process , 1957 .

[26]  Patrick van der Smagt,et al.  Switching Linear Dynamics for Variational Bayes Filtering , 2019, ICML.

[27]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[28]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[29]  Vladlen Koltun,et al.  Deep Drone Racing: Learning Agile Flight in Dynamic Environments , 2018, CoRL.

[30]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[31]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[32]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[33]  Chris Pal,et al.  Real-Time Reinforcement Learning , 2019, NeurIPS.

[34]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[35]  Sergey Levine,et al.  Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning , 2019, IEEE Robotics and Automation Letters.

[36]  Simo Srkk,et al.  Bayesian Filtering and Smoothing , 2013 .

[37]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[40]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[41]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[42]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[43]  Patrick van der Smagt,et al.  ORC—A Lightweight, Lightning-Fast Middleware , 2019, 2019 Third IEEE International Conference on Robotic Computing (IRC).

[44]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[45]  Vladlen Koltun,et al.  Deep Drone Racing: From Simulation to Reality With Domain Randomization , 2019, IEEE Transactions on Robotics.

[46]  Shaojie Shen,et al.  Learning Unmanned Aerial Vehicle Control for Autonomous Target Following , 2017, IJCAI.

[47]  Fei Gao,et al.  Teach-Repeat-Replan: A Complete and Robust System for Aggressive Flight in Complex Environments , 2019, IEEE Transactions on Robotics.

[48]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[49]  Soon-Jo Chung,et al.  Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[50]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[51]  Patrick M. Pilarski,et al.  Reactive Reinforcement Learning in Asynchronous Environments , 2018, Front. Robot. AI.

[52]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[53]  Azer Bestavros,et al.  Neuroflight: Next Generation Flight Control Firmware , 2019, ArXiv.

[54]  Fei Gao,et al.  Robust and Efficient Quadrotor Trajectory Generation for Fast Autonomous Flight , 2019, IEEE Robotics and Automation Letters.

[55]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[56]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[57]  Maziar Raissi,et al.  Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations , 2018, J. Mach. Learn. Res..

[58]  Maximilian Karl,et al.  Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[59]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[60]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[63]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[64]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.