Robot Reinforcement Learning on the Constraint Manifold

Reinforcement Learning in robotics is extremely challenging, as these 1 tasks raise many practical issues, which are normally not considered in the Ma2 chine Learning literature. One of the most important problems to consider is 3 the necessity of satisfying physical and safety constraints throughout the learning 4 process. While many Safe Exploration and Constrained Reinforcement Learning 5 techniques exist in the machine learning literature, these methods are not yet ap6 plicable to real robotics tasks. However, different from generic Reinforcement 7 Learning environments, it’s often possible to consider as known both the model 8 of the robotic agent and the mathematical definition of the constraints. Exploiting 9 this knowledge, we are able to derive a method to learn robotics tasks in simulation 10 efficiently while satisfying the constraints during the learning process. 11

[1]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[3]  Mohammad Ghavamzadeh,et al.  Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[4]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[5]  Lukas Hewing,et al.  Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[6]  Andrea Bonarini,et al.  MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..

[7]  S. S. Kim,et al.  QR Decomposition for State Space Representation of Constrained Mechanical Dynamic Systems , 1986 .

[8]  Eric Eaton,et al.  Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Daniel Kappler,et al.  Riemannian Motion Policies , 2018, ArXiv.

[11]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[12]  Deep Reinforcement Learning for Continuous Control , 2016 .

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Yisong Yue,et al.  Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.

[15]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[16]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[17]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[18]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[19]  Jun Wang,et al.  SAMBA: safe model-based & active reinforcement learning , 2020, Machine Learning.

[20]  Eitan Altman,et al.  Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program , 1998, Math. Methods Oper. Res..

[21]  Xiaohan Wei,et al.  Provably Efficient Safe Exploration via Primal-Dual Policy Optimization , 2021, AISTATS.

[22]  Byron Boots,et al.  RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[23]  Pieter Abbeel,et al.  Responsive Safety in Reinforcement Learning , 2020, ICML 2020.

[24]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[25]  P. Likins,et al.  Singular Value Decomposition for Constrained Dynamical Systems , 1985 .

[26]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[27]  Chen Chao,et al.  Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control , 2020, IEEE Robotics and Automation Letters.

[28]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[29]  Anqi Li,et al.  Composable energy policies for reactive motion generation and reinforcement learning , 2021, Robotics: Science and Systems.

[30]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[31]  Haitham Bou-Ammar,et al.  Efficient and Reactive Planning for High Speed Robot Air Hockey , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Antonio Visioli,et al.  Viability and Feasibility of Constrained Kinematic Control of Manipulators , 2018, Robotics.

[33]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.

[34]  Matthias Althoff,et al.  Safe Reinforcement Learning for Autonomous Lane Changing Using Set-Based Prediction , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[35]  Pieter Abbeel,et al.  Responsive Safety in Reinforcement Learning by PID Lagrangian Methods , 2020, ICML.

[36]  Loring W. Tu,et al.  An introduction to manifolds , 2007 .

[37]  Torsten Koller,et al.  Learning-Based Model Predictive Control for Safe Exploration , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[38]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[39]  Yongshuai Liu,et al.  IPO: Interior-point Policy Optimization under Constraints , 2019, AAAI.

[40]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..