Continuous Motion Planning with Temporal Logic Specifications using Deep Neural Networks

In this paper, we propose a model-free reinforcement learning method to synthesize control policies for motion planning problems for continuous states and actions. The robot is modelled as a labeled Markov decision process (MDP) with continuous state and action spaces. Linear temporal logics (LTL) are used to specify high-level tasks. We then train deep neural networks to approximate the value function and policy using an actor-critic reinforcement learning method. The LTL specification is converted into an annotated limit-deterministic Buchi automaton (LDBA) for continuously shaping the reward so that dense reward is available during training. A naive way of solving a motion planning problem with LTL specifications using reinforcement learning is to sample a trajectory and, if the trajectory satisfies the entire LTL formula then we assign a high reward for training. However, the sampling complexity needed to find such a trajectory is too high when we have a complex LTL formula for continuous state and action spaces. As a result, it is very unlikely that we get enough reward for training if all sample trajectories start from the initial state in the automata. In this paper, we propose a method that samples not only an initial state from the state space, but also an arbitrary state in the automata at the beginning of each training episode. We test our algorithm in simulation using a car-like robot and find out that our method can learn policies for different working configurations and LTL specifications successfully.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Daniel Kroening,et al.  Certified Reinforcement Learning with Logic Guidance , 2019, Artif. Intell..

[3]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[4]  Toshimitsu Ushio,et al.  Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata , 2020, IEEE Control Systems Letters.

[5]  Daniel Kroening,et al.  Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[6]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[7]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[8]  K.J. Kyriakopoulos,et al.  Automatic synthesis of multi-agent motion tasks based on LTL specifications , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[9]  Jan Kretínský,et al.  Limit-Deterministic Büchi Automata for Linear Temporal Logic , 2016, CAV.

[10]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Sven Schewe,et al.  Omega-Regular Objectives in Model-Free Reinforcement Learning , 2018, TACAS.

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Mihalis Yannakakis,et al.  The complexity of probabilistic verification , 1995, JACM.

[15]  Christel Baier,et al.  Principles of model checking , 2008 .

[16]  Ufuk Topcu,et al.  Receding horizon temporal logic planning for dynamical systems , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[17]  Daniel Kroening,et al.  Logically-Constrained Neural Fitted Q-Iteration , 2018, AAMAS.

[18]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[19]  Michael M. Zavlanos,et al.  Reduced variance deep reinforcement learning with temporal logic specifications , 2019, ICCPS.

[20]  Calin Belta,et al.  A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks , 2018, 2018 Annual American Control Conference (ACC).

[21]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[22]  Jan Kretínský,et al.  Owl: A Library for ω-Words, Automata, and LTL , 2018, ATVA.

[23]  Jun Liu,et al.  Robustly Complete Synthesis of Memoryless Controllers for Nonlinear Systems With Reach-and-Stay Specifications , 2018, IEEE Transactions on Automatic Control.

[24]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[25]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[26]  Lydia E. Kavraki,et al.  Sampling-based motion planning with temporal goals , 2010, 2010 IEEE International Conference on Robotics and Automation.

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Fabio Somenzi,et al.  Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning , 2020, 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS).

[29]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[30]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[31]  Daniel Kroening,et al.  Modular Deep Reinforcement Learning with Temporal Logic Specifications , 2019, ArXiv.