Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results

This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degreeof-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Preliminary results of a lowdimensional tuning problem highlight the method’s potential for automatic controller tuning on robotic platforms.

[1]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[2]  Stefan Schaal,et al.  Full dynamics LQR control of a humanoid robot: An experimental study on balancing and squatting , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[3]  Michael A. Osborne,et al.  Gaussian Processes for Global Optimization , 2008 .

[4]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[5]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[6]  S. Trimpe,et al.  The Balancing Cube: A Dynamic Sculpture As Test Bed for Distributed Estimation and Control , 2012, IEEE Control Systems.

[7]  R. D’Andrea,et al.  A Self-Tuning LQR Approach Demonstrated on an Inverted Pendulum , 2014 .

[8]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[9]  Gaurav S. Sukhatme,et al.  An autonomous manipulation system based on force control and optimization , 2014, Auton. Robots.

[10]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[11]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[12]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13]  D. Lizotte Practical bayesian optimization , 2008 .

[14]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[15]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[16]  Ian R. Manchester,et al.  Feedback controller parameterizations for Reinforcement Learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[17]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..