General duality between optimal control and estimation

Optimal control and estimation are dual in the LQG setting, as Kalman discovered, however this duality has proven difficult to extend beyond LQG. Here we obtain a more natural form of LQG duality by replacing the Kalman-Bucy filter with the information filter. We then generalize this result to non-linear stochastic systems, discrete stochastic systems, and deterministic systems. All forms of duality are established by relating exponentiated costs to probabilities. Unlike the LQG setting where control and estimation are in one-to-one correspondence, in the general case control turns out to be a larger problem class than estimation and only a sub-class of control problems have estimation duals. These are problems where the Bellman equation is intrinsically linear. Apart from their theoretical significance, our results make it possible to apply estimation algorithms to control problems and vice versa.

[1]  R. Mortensen Maximum-likelihood recursive nonlinear filtering , 1968 .

[2]  Charles J. Holland,et al.  A new energy characterization of the smallest eigenvalue of the schrödinger equation , 1977 .

[3]  K. Ioannis On a stochastic representation for the principal eigenvalue of a second-order differential equation , 1980 .

[4]  S. Mitter,et al.  Optimal control and nonlinear filtering for nondegenerate diffusion processes , 1982 .

[5]  B. Øksendal Stochastic Differential Equations , 1985 .

[6]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[7]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[8]  Robust continuous-time smoothers without two-sided stochastic integrals , 2002, IEEE Trans. Autom. Control..

[9]  Sanjoy K. Mitter,et al.  A Variational Approach to Nonlinear Estimation , 2003, SIAM J. Control. Optim..

[10]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[11]  A. Doucet,et al.  Monte Carlo Smoothing for Nonlinear Time Series , 2004, Journal of the American Statistical Association.

[12]  Huanshui Zhang,et al.  Linear quadratic regulation for linear time-varying systems with multiple input delays part I: discrete-time case , 2005, 2005 International Conference on Control and Automation.

[13]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[14]  Guang-Ren Duan,et al.  Linear quadratic regulation for linear time-varying systems with multiple input delays part II: continuous-time case , 2006, 2005 International Conference on Control and Automation.

[15]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[16]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  Emanuel Todorov,et al.  Eigenfunction approximation methods for linearly-solvable optimal control problems , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.