Stability of Controllers for Gaussian Process Dynamics

Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step towards widespread deployment of learning control, we provide stability analysis tools for controllers acting on dynamics represented by Gaussian processes (GPs). We consider differentiable Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For both cases, we analyze finite and infinite time horizons. Furthermore, we study the effect of disturbances on the stability results. Empirical evaluations on simulated benchmark problems support our theoretical results.

[1]  E. J. Routh A Treatise on the Stability of a Given State of Motion: Particularly Steady Motion , 2010 .

[2]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[3]  Peter Szabó,et al.  Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods , 2005, NIPS.

[4]  K. Chung,et al.  Stochastic Stability and Control, Mathematics in Science and Engineering , 1972 .

[5]  Zydrunas Gimbutas,et al.  A numerical algorithm for the construction of efficient quadrature rules in two and higher dimensions , 2010, Comput. Math. Appl..

[6]  Graziano Chesi Estimating the domain of attraction for uncertain polynomial systems , 2004, Autom..

[7]  Dieter Fox,et al.  GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models , 2008, IROS.

[8]  Ian Postlethwaite,et al.  Multivariable Feedback Control: Analysis and Design , 1996 .

[9]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[10]  Stephen P. Boyd,et al.  Extensions of Gauss Quadrature Via Linear Programming , 2014, Found. Comput. Math..

[11]  Jan Peters,et al.  Model Learning in Robotics: a Survey , 2011 .

[12]  A. M. Lyapunov The general problem of the stability of motion , 1992 .

[13]  H. Kushner Finite time stochastic stability and the analysis of tracking systems , 1966 .

[14]  Yunpeng Pan,et al.  Probabilistic Differential Dynamic Programming , 2014, NIPS.

[15]  G. A. Evans The estimation of errors in numerical quadrature , 1994 .

[16]  Amir Ali Ahmadi,et al.  Complexity of ten decision problems in continuous time dynamical systems , 2012, 2013 American Control Conference.

[17]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[18]  Jun Nakanishi,et al.  A locally weighted learning composite adaptive controller with structure adaptation , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Philip Rabinowitz,et al.  Methods of Numerical Integration , 1985 .

[20]  Amir Ali Ahmadi,et al.  Control and verification of high-dimensional systems with DSOS and SDSOS programming , 2014, 53rd IEEE Conference on Decision and Control.

[21]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  K. Ritter,et al.  High dimensional integration of smooth functions over cubes , 1996 .

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Xiaoke Yang,et al.  Fault tolerant control using Gaussian processes and model predictive control , 2013 .

[25]  Russ Tedrake,et al.  Adaptive control design for underactuated systems using sums-of-squares optimization , 2014, 2014 American Control Conference.

[26]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[27]  Sandra Hirche,et al.  Stability of Gaussian process state space models , 2016, 2016 European Control Conference (ECC).

[28]  Gang Tao Adaptive Control Design and Analysis (Adaptive and Learning Systems for Signal Processing, Communications and Control Series) , 2003 .

[29]  Anuradha M. Annaswamy,et al.  Stable Adaptive Systems , 1989 .

[30]  A. Hurwitz Ueber die Bedingungen, unter welchen eine Gleichung nur Wurzeln mit negativen reellen Theilen besitzt , 1895 .

[31]  Niels Richard Hansen Geometric ergodicity of discrete-time approximations to multivariate diffusions , 2003 .

[32]  Bernhard Schölkopf,et al.  Nonparametric dynamics estimation for time periodic systems , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[33]  E. Süli,et al.  An introduction to numerical analysis , 2003 .

[34]  Kendall E. Atkinson An introduction to numerical analysis , 1978 .

[35]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .

[36]  Florian Heiss,et al.  Likelihood approximation by numerical integration on sparse grids , 2008 .

[37]  J. Corriou Chapter 12 – Nonlinear Control , 2017 .

[38]  Franco Blanchini,et al.  Set invariance in control , 1999, Autom..

[39]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[40]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  J. Kocijan,et al.  Gaussian process model based predictive control , 2004, Proceedings of the 2004 American Control Conference.

[42]  Amir Ali Ahmadi,et al.  Converse results on existence of sum of squares Lyapunov functions , 2011, IEEE Conference on Decision and Control and European Control Conference.

[43]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[44]  Ufuk Topcu,et al.  Robust Region-of-Attraction Estimation , 2010, IEEE Transactions on Automatic Control.

[45]  H. Jin Kim,et al.  Stable adaptive control with online learning , 2004, NIPS.

[46]  B. S. Skrainka,et al.  High Performance Quadrature Rules: How Numerical Integration Affects a Popular Model of Product Differentiation , 2011 .

[47]  M. Masjed-Jamei,et al.  New Error Bounds for Gauss-Legendre Quadrature Rules , 2014 .

[48]  Peter J Seiler,et al.  Help on SOS [Ask the Experts] , 2010 .

[49]  Nicholas Roy,et al.  Finite-Time Regional Verification of Stochastic Nonlinear Systems , 2012 .

[50]  Duy Nguyen-Tuong,et al.  Stability of Controllers for Gaussian Process Forward Models , 2016, ICML.

[51]  A. Papachristodoulou,et al.  Analysis of Non-polynomial Systems using the Sum of Squares Decomposition , 2005 .

[52]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.