论文信息 - Online Linear Quadratic Control

Online Linear Quadratic Control

We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to "strongly stable" policies that mix exponentially fast to a steady state.

[1] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..

[2] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[3] Dong-Hwan Lee,et al. A Semidefinite Programming Formulation of the LQR Problem and Its Dual , 2016 .

[4] P. Kumar,et al. Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[5] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[6] Manfred Morari,et al. Linear controller design for chance constrained systems , 2014, Autom..

[7] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[8] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[9] Adel Javanmard,et al. Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems , 2012, NIPS.

[10] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[11] Sanjeev Arora,et al. Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[12] Marin Kobilarov,et al. Robust policy search with applications to safe vehicle navigation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[14] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[15] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.

[16] Alessandro Lazaric,et al. Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[17] Vicenç Gómez,et al. Fast rates for online learning in Linearly Solvable Markov Decision Processes , 2017, COLT.

[18] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[19] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[20] Venkataramanan Balakrishnan,et al. Semidefinite programming duality and linear time-invariant systems , 2003, IEEE Trans. Autom. Control..

[21] Jim Gao,et al. Machine Learning Applications for Data Center Optimization , 2014 .

[22] Emanuel Todorov,et al. Convex control design via covariance minimization , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23] B. Mark. On Self Tuning Regulators , 1972 .

[24] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[25] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[26] Peter L. Bartlett,et al. Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[27] B. Anderson,et al. Linear Optimal Control , 1971 .

[28] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[29] S. Bittanti,et al. ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[30] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[31] Karan Singh,et al. Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[32] Pramod P. Khargonekar,et al. Constrained Infinite-Horizon Linear Quadratic Regulation of Discrete-Time Systems , 2007, IEEE Transactions on Automatic Control.

[33] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..

[34] Varun Kanade,et al. Tracking Adversarial Targets , 2014, ICML.

[35] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[36] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.