Online Linear Quadratic Control

We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to "strongly stable" policies that mix exponentially fast to a steady state.

[1]  Yishay Mansour,et al.  Online Markov Decision Processes , 2009, Math. Oper. Res..

[2]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[3]  Dong-Hwan Lee,et al.  A Semidefinite Programming Formulation of the LQR Problem and Its Dual , 2016 .

[4]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[5]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[6]  Manfred Morari,et al.  Linear controller design for chance constrained systems , 2014, Autom..

[7]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[8]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[9]  Adel Javanmard,et al.  Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.

[10]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[11]  Sanjeev Arora,et al.  Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[12]  Marin Kobilarov,et al.  Robust policy search with applications to safe vehicle navigation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[14]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[15]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.

[16]  Alessandro Lazaric,et al.  Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[17]  Vicenç Gómez,et al.  Fast rates for online learning in Linearly Solvable Markov Decision Processes , 2017, COLT.

[18]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[20]  Venkataramanan Balakrishnan,et al.  Semidefinite programming duality and linear time-invariant systems , 2003, IEEE Trans. Autom. Control..

[21]  Jim Gao,et al.  Machine Learning Applications for Data Center Optimization , 2014 .

[22]  Emanuel Todorov,et al.  Convex control design via covariance minimization , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  B. Mark On Self Tuning Regulators , 1972 .

[24]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[25]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[26]  Peter L. Bartlett,et al.  Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[27]  B. Anderson,et al.  Linear Optimal Control , 1971 .

[28]  Peter Auer,et al.  Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[29]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[30]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[31]  Karan Singh,et al.  Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[32]  Pramod P. Khargonekar,et al.  Constrained Infinite-Horizon Linear Quadratic Regulation of Discrete-Time Systems , 2007, IEEE Transactions on Automatic Control.

[33]  Shie Mannor,et al.  Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..

[34]  Varun Kanade,et al.  Tracking Adversarial Targets , 2014, ICML.

[35]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[36]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.