Regret-Optimal Full-Information Control

We consider the infinite-horizon, discrete-time fullinformation control problem. Motivated by learning theory, as a criterion for controller design we focus on regret, defined as the difference between the LQR cost of a causal controller (that has only access to past and current disturbances) and the LQR cost of a clairvoyant one (that has also access to future disturbances). In the full-information setting, there is a unique optimal non-causal controller that in terms of LQR cost dominates all other controllers, and we focus on the regret compared to this particular controller. Since the regret itself is a function of the disturbances, we consider the worst-case regret over all possible bounded energy disturbances, and propose to find a causal controller that minimizes this worst-case regret. The resulting controller has the interpretation of guaranteeing the smallest possible regret compared to the best non-causal controller that can see the future, no matter what the disturbances are. We show that the regretoptimal control problem can be reduced to a Nehari extension problem, i.e., to approximate an anticausal operator with a causal one in the operator norm. In the state-space setting we obtain explicit formulas for the optimal regret and for the regret-optimal controller (in both the causal and the strictly causal settings). The regret-optimal controller is the sum of the classical H2 state-feedback law and an n-th order controller (where n is the state dimension of the plant) obtained from the Nehari problem. The controller construction simply requires the solution to the standard LQR Riccati equation, in addition to two Lyapunov equations. Simulations over a range of plants demonstrates that the regret-optimal controller interpolates nicely between the H2 and the H∞ optimal controllers, and generally has H2 and H∞ costs that are simultaneously close to their optimal values. The regret-optimal controller thus presents itself as a viable option for control system design.

[1]  Yishay Mansour,et al.  Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.

[2]  Sham M. Kakade,et al.  The Nonstochastic Control Problem , 2020, ALT.

[3]  B. Hassibi,et al.  Regret-optimal control in dynamic environments , 2020, ArXiv.

[4]  Babak Hassibi,et al.  Regret-optimal measurement-feedback control , 2020, L4DC.

[5]  Babak Hassibi,et al.  Regret-Optimal Filtering , 2021, AISTATS.

[6]  P. Khargonekar,et al.  State-space solutions to standard H2 and H∞ control problems , 1988, 1988 American Control Conference.

[7]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[8]  Babak Hassibi,et al.  The Power of Linear Controllers in LQR Control , 2020, 2022 IEEE 61st Conference on Decision and Control (CDC).

[9]  Kamyar Azizzadenesheli,et al.  Explore More and Improve Regret in Linear Quadratic Regulators , 2020, ArXiv.

[10]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[11]  Z. Nehari On Bounded Bilinear Forms , 1957 .

[12]  Sham M. Kakade,et al.  Online Control with Adversarial Disturbances , 2019, ICML.

[13]  Na Li,et al.  Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis , 2019, NeurIPS.

[14]  L. Rodman,et al.  Nehari Interpolation Problem , 1990 .

[15]  T. Başar Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses , 2001 .

[16]  Karan Singh,et al.  Logarithmic Regret for Online Control , 2019, NeurIPS.

[17]  Max Simchowitz,et al.  Logarithmic Regret for Adversarial Online Control , 2020, ICML.

[18]  Babak Hassibi,et al.  Regret-Optimal Controller for the Full-Information Problem , 2021, 2021 American Control Conference (ACC).