论文信息 - Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (AdaptOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. AdaptOn estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, AdaptOn constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, AdaptOn improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that AdaptOn achieves a regret upper bound of $\text{polylog}\left(T\right)$, after $T$ time steps of agent-environment interaction. To the best of our knowledge, AdaptOn is the first algorithm that achieves $\text{polylog}\left(T\right)$ regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.

[1] M. Phan,et al. Integrated system identification and state estimation for control offlexible space structures , 1992 .

[2] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[3] Varun Kanade,et al. Tracking Adversarial Targets , 2014, ICML.

[4] L. Meng,et al. The optimal perturbation bounds of the Moore–Penrose inverse under the Frobenius norm , 2010 .

[5] Max Simchowitz,et al. Improper Learning for Non-Stochastic Control , 2020, COLT.

[6] Β. L. HO,et al. Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[7] George J. Pappas,et al. Online Learning of the Kalman Filter With Logarithmic Regret , 2020, IEEE Transactions on Automatic Control.

[8] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[9] Karan Singh,et al. No-Regret Prediction in Marginally Stable Systems , 2020, COLT.

[10] Mohamad Kazem Shirani Faradonbeh,et al. Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[13] M. Phan,et al. Identification of observer/Kalman filter Markov parameters: Theory and experiments , 1993 .

[14] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[15] B. Moor,et al. Closed loop subspace system identification , 1997 .

[16] P. Wedin. Perturbation theory for pseudo-inverses , 1973 .

[17] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[18] Avinatan Hassidim,et al. Online Linear Quadratic Control , 2018, ICML.

[19] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[20] Holden Lee,et al. Robust guarantees for learning an autoregressive filter , 2019, ALT.

[21] Bruce Lee,et al. Non-asymptotic Closed-Loop System Identification using Autoregressive Processes and Hankel Model Reduction , 2019, 2020 59th IEEE Conference on Decision and Control (CDC).

[22] Samet Oymak,et al. Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[23] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[24] Lennart Ljung,et al. Closed-loop identification revisited , 1999, Autom..

[25] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .

[26] Dante C. Youla,et al. Modern Wiener-Hopf Design of Optimal Controllers. Part I , 1976 .

[27] Kamyar Azizzadenesheli,et al. Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting , 2020, 2021 American Control Conference (ACC).

[28] Ambuj Tewari,et al. Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[29] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[30] Petre Stoica,et al. Decentralized Control , 2018, The Control Systems Handbook.

[31] Biao Huang,et al. System Identification , 2000, Control Theory for Physicists.

[32] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[33] Bart De Moor,et al. N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[34] Ambuj Tewari,et al. Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[35] Karan Singh,et al. Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[36] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[37] Si-Zhao Joe Qin,et al. An overview of subspace identification , 2006, Comput. Chem. Eng..

[38] Sanjeev Arora,et al. Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[39] Karan Singh,et al. Logarithmic Regret for Online Control , 2019, NeurIPS.

[40] Max Simchowitz,et al. Learning Linear Dynamical Systems with Semi-Parametric Least Squares , 2019, COLT.

[41] Yi Zhang,et al. Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[42] Ambuj Tewari,et al. Input perturbations for adaptive control and learning , 2018, Autom..

[43] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.

[44] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[45] Max Simchowitz,et al. Logarithmic Regret for Adversarial Online Control , 2020, ICML.

[46] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[47] Alon Cohen,et al. Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently , 2020, ICML.

[48] Lennart Ljung,et al. Closed-Loop Subspace Identification with Innovation Estimation , 2003 .

[49] Alessandro Chiuso,et al. Consistency analysis of some closed-loop subspace identification methods , 2005, Autom..

[50] Lennart Ljung,et al. Subspace identification from closed loop data , 1996, Signal Process..

[51] T. Lai,et al. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[52] Csaba Szepesvári,et al. Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.

[53] Sham M. Kakade,et al. The Nonstochastic Control Problem , 2020, ALT.

[54] George J. Pappas,et al. Sample Complexity of Kalman Filtering for Unknown Systems , 2019, L4DC.

[55] Sham M. Kakade,et al. Online Control with Adversarial Disturbances , 2019, ICML.

[56] Michel Verhaegen,et al. Identification of the deterministic part of MIMO state space models given in innovations form from input-output data , 1994, Autom..

[57] Richard W. Longman,et al. System identification from closed-loop data with known output feedback dynamics , 1994 .

[58] R. E. Kalman,et al. A New Approach to Linear Filtering and Prediction Problems , 2002 .

[59] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[60] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .

[61] Kamyar Azizzadenesheli,et al. Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems , 2020, ArXiv.

[62] Magnus Jansson,et al. Subspace Identification and ARX Modeling , 2003 .

[63] Alessandro Lazaric,et al. Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[64] Max Simchowitz,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.

[65] Babak Hassibi,et al. Regret Minimization in Partially Observable Linear Quadratic Control , 2020, ArXiv.

[66] George J. Pappas,et al. Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[67] T. Lai,et al. Asymptotically efficient self-tuning regulators , 1987 .

[68] Shie Mannor,et al. Online Learning for Adversaries with Memory: Price of Past Mistakes , 2015, NIPS.

[69] Munther A. Dahleh,et al. Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[70] Thomas B. Schön,et al. Robust exploration in linear quadratic reinforcement learning , 2019, NeurIPS.

[71] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..