论文信息 - Robot Reinforcement Learning on the Constraint Manifold - 字舞流文

Robot Reinforcement Learning on the Constraint Manifold

Reinforcement Learning in robotics is extremely challenging, as these 1 tasks raise many practical issues, which are normally not considered in the Ma2 chine Learning literature. One of the most important problems to consider is 3 the necessity of satisfying physical and safety constraints throughout the learning 4 process. While many Safe Exploration and Constrained Reinforcement Learning 5 techniques exist in the machine learning literature, these methods are not yet ap6 plicable to real robotics tasks. However, different from generic Reinforcement 7 Learning environments, it’s often possible to consider as known both the model 8 of the robotic agent and the mathematical definition of the constraints. Exploiting 9 this knowledge, we are able to derive a method to learn robotics tasks in simulation 10 efficiently while satisfying the constraints during the learning process. 11

Haitham Bou-Ammar | Jan Peters | Davide Tateo | Puze Liu | Jan Peters | Haitham Bou-Ammar | Davide Tateo | Puze Liu

[1] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2] Yuval Tassa,et al. Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[3] Mohammad Ghavamzadeh,et al. Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[4] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[5] Lukas Hewing,et al. Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[6] Andrea Bonarini,et al. MushroomRL: Simplifying Reinforcement Learning Research , 2020, J. Mach. Learn. Res..

[7] S. S. Kim,et al. QR Decomposition for State Space Representation of Constrained Mechanical Dynamic Systems , 1986 .

[8] Eric Eaton,et al. Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret , 2015, ICML.

[9] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10] Daniel Kappler,et al. Riemannian Motion Policies , 2018, ArXiv.

[11] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[12] Deep Reinforcement Learning for Continuous Control , 2016 .

[13] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[14] Yisong Yue,et al. Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.

[15] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[16] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.

[17] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[18] Steffen Udluft,et al. Safe exploration for reinforcement learning , 2008, ESANN.

[19] Jun Wang,et al. SAMBA: safe model-based & active reinforcement learning , 2020, Machine Learning.

[20] Eitan Altman,et al. Constrained Markov decision processes with total cost criteria: Lagrangian approach and dual linear program , 1998, Math. Methods Oper. Res..

[21] Xiaohan Wei,et al. Provably Efficient Safe Exploration via Primal-Dual Policy Optimization , 2021, AISTATS.

[22] Byron Boots,et al. RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[23] Pieter Abbeel,et al. Responsive Safety in Reinforcement Learning , 2020, ICML 2020.

[24] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[25] P. Likins,et al. Singular Value Decomposition for Constrained Dynamical Systems , 1985 .

[26] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[27] Chen Chao,et al. Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control , 2020, IEEE Robotics and Automation Letters.

[28] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[29] Anqi Li,et al. Composable energy policies for reactive motion generation and reinforcement learning , 2021, Robotics: Science and Systems.

[30] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[31] Haitham Bou-Ammar,et al. Efficient and Reactive Planning for High Speed Robot Air Hockey , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32] Antonio Visioli,et al. Viability and Feasibility of Constrained Kinematic Control of Manipulators , 2018, Robotics.

[33] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.

[34] Matthias Althoff,et al. Safe Reinforcement Learning for Autonomous Lane Changing Using Set-Based Prediction , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[35] Pieter Abbeel,et al. Responsive Safety in Reinforcement Learning by PID Lagrangian Methods , 2020, ICML.

[36] Loring W. Tu,et al. An introduction to manifolds , 2007 .

[37] Torsten Koller,et al. Learning-Based Model Predictive Control for Safe Exploration , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[38] Jaime F. Fisac,et al. Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[39] Yongshuai Liu,et al. IPO: Interior-point Policy Optimization under Constraints , 2019, AAAI.

[40] Javier García,et al. Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..