Sampling-based Inverse Reinforcement Learning Algorithms with Safety Constraints

Planning for robotic systems is frequently formulated as an optimization problem. Instead of manually tweaking the parameters of the cost function, they can be learned from human demonstrations by Inverse Reinforcement Learning (IRL). Common IRL approaches employ a maximum entropy trajectory distribution that can be learned with soft reinforcement learning, where the reward maximization is regularized with an entropy objective. The consideration of safety constraints is of paramount importance for human-robot collaboration. For this reason, our work addresses maximum entropy IRL in constrained environments. Our contribution to this research area is threefold: (1) We propose Constrained Soft Reinforcement Learning (CSRL), an extension of soft reinforcement learning to Constrained Markov Decision Processes (CMDPs). (2) We transfer maximum entropy IRL to CMDPs based on CSRL. (3) We show that using importance sampling in maximum entropy IRL in constrained environments introduces a bias and fails to achieve feature matching. In our evaluation we consider the tactical lane change decision of an autonomous vehicle in a highway scenario modeled in the SUMO traffic simulation.