Stochastic Optimal Control as Approximate Input Inference

Optimal control of stochastic nonlinear dynamical systems is a major challenge in the domain of robot learning. Given the intractability of the global control problem, state-of-the-art algorithms focus on approximate sequential optimization techniques, that heavily rely on heuristics for regularization in order to achieve stable convergence. By building upon the duality between inference and control, we develop the view of Optimal Control as Input Estimation, devising a probabilistic stochastic optimal control formulation that iteratively infers the optimal input distributions by minimizing an upper bound of the control cost. Inference is performed through Expectation Maximization and message passing on a probabilistic graphical model of the dynamical system, and time-varying linear Gaussian feedback controllers are extracted from the joint state-action distribution. This perspective incorporates uncertainty quantification, effective initialization through priors, and the principled regularization inherent to the Bayesian treatment. Moreover, it can be shown that for deterministic linearized systems, our framework derives the maximum entropy linear quadratic optimal control law. We provide a complete and detailed derivation of our probabilistic approach and highlight its advantages in comparison to other deterministic and probabilistic solvers.

[1]  Evangelos A. Theodorou,et al.  Sample Efficient Path Integral Control under Uncertainty , 2015, NIPS.

[2]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[3]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[4]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[5]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[6]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[7]  Christian Hoffmann,et al.  Linear Optimal Control on Factor Graphs — A Message Passing Perspective — , 2017 .

[8]  Marc Toussaint,et al.  Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[9]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Ron Alterovitz,et al.  Stochastic Extended LQR for Optimization-Based Motion Planning Under Uncertainty , 2016, IEEE Transactions on Automation Science and Engineering.

[11]  Philipp Hennig,et al.  Dual Control for Approximate Bayesian Reinforcement Learning , 2015, J. Mach. Learn. Res..

[12]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[14]  Y. Bar-Shalom Stochastic dynamic programming: Caution and probing , 1981 .

[15]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[16]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[17]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[18]  Miroslav Kárný,et al.  Towards fully probabilistic control design , 1996, Autom..

[19]  K. Rawlik On probabilistic inference approaches to stochastic optimal control , 2013 .

[20]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[21]  R. Stengel Stochastic Optimal Control: Theory and Application , 1986 .

[22]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[23]  Hans-Andrea Loeliger,et al.  On sparsity by NUV-EM, Gaussian message passing, and Kalman smoothing , 2016, 2016 Information Theory and Applications Workshop (ITA).

[24]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Li Ping,et al.  The Factor Graph Approach to Model-Based Signal Processing , 2007, Proceedings of the IEEE.

[26]  Michael A. Osborne,et al.  Probabilistic numerics and uncertainty in computations , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[27]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[28]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[29]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[30]  Constantine Caramanis,et al.  Regularized EM Algorithms: A Unified Framework and Statistical Guarantees , 2015, NIPS.

[31]  Leopoldo Armesto,et al.  Extended Rauch-Tung-Striebel controller , 2013, 52nd IEEE Conference on Decision and Control.

[32]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[33]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[34]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[35]  Tatiana V. Guy,et al.  Fully probabilistic control design , 2006, Syst. Control. Lett..

[36]  Jur P. van den Berg,et al.  Extended LQR: Locally-Optimal Feedback Control for Systems with Non-Linear Dynamics and Non-Quadratic Cost , 2013, ISRR.

[37]  Bradley M. Bell,et al.  The Iterated Kalman Smoother as a Gauss-Newton Method , 1994, SIAM J. Optim..

[38]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[39]  Miroslav Kárný,et al.  Stochastic control optimal in the Kullback sense , 2008, Kybernetika.

[40]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[41]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[42]  Lukas Bruderer Input Estimation and Dynamical System Identification: New Algorithms and Results , 2015 .

[43]  Vicenç Gómez,et al.  Policy Search for Path Integral Control , 2014, ECML/PKDD.

[44]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.