Model-based policy search for automatic tuning of multivariate PID controllers

PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.

[1]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[2]  Zhiqiang Gao,et al.  An application of nonlinear PID control to a class of truck ABS problems , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[3]  Daniel Sbarbaro,et al.  Nonlinear adaptive control using non-parametric Gaussian Process prior models , 2002 .

[4]  N. Munro,et al.  PID controllers: recent tuning methods and design to specification , 2002 .

[5]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[6]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Aidan O'Dwyer,et al.  Handbook of PI and PID controller tuning rules , 2003 .

[8]  Agathe Girard,et al.  Adaptive, Cautious, Predictive control with Gaussian Process Priors , 2003 .

[9]  Toru Yamamoto,et al.  Design and experimental evaluation of a multivariable self-tuning PID controller , 2004 .

[10]  Agathe Girard,et al.  Dynamic systems identification with Gaussian processes , 2005 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[13]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[14]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[17]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[18]  D Fox,et al.  Multiple-Target Reinforcement Learning with a Single Policy , 2011 .

[19]  S. Trimpe,et al.  The Balancing Cube: A Dynamic Sculpture As Test Bed for Distributed Estimation and Control , 2012, IEEE Control Systems.

[20]  Alois Knoll,et al.  Learning Throttle Valve Control Using Policy Search , 2013, ECML/PKDD.

[21]  S. Billings Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains , 2013 .

[22]  Gaurav S. Sukhatme,et al.  An autonomous manipulation system based on force control and optimization , 2014, Auton. Robots.

[23]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Stefan Schaal,et al.  Automatic LQR tuning based on Gaussian process global optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Tore Hägglund,et al.  Asymmetric relay autotuning - Practical features for industrial use , 2016 .