Regret-Optimal Estimation and Control

We consider estimation and control in linear time-varying dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing causal estimators and controllers which compete against a clairvoyant noncausal policy, instead of the best policy selected in hindsight from some fixed parametric class. We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form using operator-theoretic techniques from robust control and present tight, data-dependent bounds on the regret incurred by our algorithms in terms of the energy of the disturbances. Our results can be viewed as extending traditional robust estimation and control, which focuses on minimizing worst-case cost, to minimizing worst-case regret. We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended Kalman Filter (EKF) for systems with nonlinear dynamics and present numerical experiments which show that our regret-optimal algorithms can significantly outperform standard approaches to estimation and control.

[1]  Sham M. Kakade,et al.  Online Control with Adversarial Disturbances , 2019, ICML.

[2]  Max Simchowitz,et al.  Logarithmic Regret for Adversarial Online Control , 2020, ICML.

[3]  Adam Wierman,et al.  Thinking fast and slow: Optimization decomposition across timescales , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[4]  Yishay Mansour,et al.  Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.

[5]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[6]  Adam Wierman,et al.  An Online Algorithm for Smoothed Regression and LQR Control , 2018, AISTATS.

[7]  Babak Hassibi,et al.  Regret-Optimal Filtering , 2021, AISTATS.

[8]  Babak Hassibi,et al.  Regret-optimal measurement-feedback control , 2020, L4DC.

[9]  N. I. Miridakis,et al.  Linear Estimation , 2018, Digital and Statistical Signal Processing.

[10]  Zhi-Hua Zhou,et al.  Non-stationary Online Learning with Memory and Non-stochastic Control , 2021, AISTATS.

[11]  J. Doyle,et al.  Guaranteed Margins for LQG Regulators , 1972 .

[12]  Adam Wierman,et al.  Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization , 2019, NeurIPS.

[13]  Percy Liang,et al.  Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[14]  Seshadhri Comandur,et al.  Efficient learning algorithms for changing environments , 2009, ICML '09.

[15]  Richard M. Murray,et al.  Feedback Systems: An Introduction for Scientists and Engineers , 2008 .

[16]  Adam Wierman,et al.  Online Optimization with Memory and Competitive Control , 2020, NeurIPS.

[17]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[18]  Babak Hassibi,et al.  The Power of Linear Controllers in LQR Control , 2020, 2022 IEEE 61st Conference on Decision and Control (CDC).

[19]  Sham M. Kakade,et al.  The Nonstochastic Control Problem , 2020, ALT.

[20]  Babak Hassibi,et al.  Regret-Optimal Full-Information Control , 2021, ArXiv.

[21]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[22]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[23]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[24]  Kamyar Azizzadenesheli,et al.  Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems , 2020, ArXiv.

[25]  Elad Hazan,et al.  Adaptive Regret for Control of Time-Varying Dynamics , 2020, ArXiv.