Policy Search for Path Integral Control

Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn open-loop controllers and second, current sampling procedures are inefficient and not scalable to high dimensional systems. We introduce the efficient Path Integral Relative-Entropy Policy Search (PI-REPS) algorithm for learning feedback policies with PI control. Our algorithm is inspired by information theoretic policy updates that are often used in policy search. We use these updates to approximate the state trajectory distribution that is known to be optimal from the PI control theory. Our approach allows for a principled treatment of different sampling distributions and can be used to estimate many types of parametric or non-parametric feedback controllers. We show that PI-REPS significantly outperforms current methods and is able to solve tasks that are out of reach for current methods.

[1]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[2]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[3]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[4]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[5]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[6]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[7]  Emanuel Todorov,et al.  Policy gradients in linearly-solvable MDPs , 2010, NIPS.

[8]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[9]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[10]  Stefan Schaal,et al.  Hierarchical reinforcement learning with movement primitives , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[11]  Stefan Schaal,et al.  Learning to grasp under uncertainty , 2011, 2011 IEEE International Conference on Robotics and Automation.

[12]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[13]  Hilbert J. Kappen,et al.  Dynamic policy programming , 2010, J. Mach. Learn. Res..

[14]  Evangelos Theodorou,et al.  Relative entropy and free energy dualities: Connections to Path Integral and KL control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[15]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[16]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[17]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[18]  Evangelos Theodorou,et al.  Tendon-driven control of biomechanical and robotic systems: A path integral reinforcement learning approach , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Francesco Nori,et al.  Open-loop stochastic optimal control of a passive noise-rejection variable stiffness actuator: Application to unstable tasks , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Marc Toussaint,et al.  Path Integral Control by Reproducing Kernel Hilbert Space Embedding , 2013, IJCAI.

[21]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.