Discounted near-optimal control of general continuous-action nonlinear systems using optimistic planning

We propose an optimistic planning method to search for near-optimal sequences of actions in discrete-time, infinite-horizon optimal control problems with discounted rewards. The dynamics are general nonlinear, while the action (input) is scalar and compact. The method works by iteratively splitting the infinite-dimensional search space into hyperboxes. Under appropriate conditions on the dynamics and rewards, we analyze the shrinking rate of the range of possible values in each box. When coupled with a measure of problem complexity, this leads to an overall convergence rate of the algorithm to the infinite-horizon optimum, as a function of computation invested. We provide simulation results showing that the algorithm is useful in practice, and comparing it with two alternative planning methods.

[1]  Lucian Busoniu,et al.  An analysis of optimistic, best-first search for minimax sequential decision making , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[2]  Rémi Munos,et al.  Optimistic Planning of Deterministic Systems , 2008, EWRL.

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  Lars Grne,et al.  Nonlinear Model Predictive Control: Theory and Algorithms , 2011 .

[5]  Michael L. Littman,et al.  Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[6]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[7]  Paulo Tabuada,et al.  Dynamic programming formulation of periodic event-triggered control: Performance Guarantees and co-design , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[9]  Dimitri P. Bertsekas,et al.  Dynamic programming and optimal control, 3rd Edition , 2005 .

[10]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[11]  Michael L. Littman,et al.  Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[12]  Robert Babuska,et al.  Optimistic planning for continuous-action deterministic systems , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[13]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[14]  Bart De Schutter,et al.  Optimistic planning with a limited number of action switches for near-optimal nonlinear control , 2014, 53rd IEEE Conference on Decision and Control.

[15]  Konstantinos V. Katsikopoulos,et al.  Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..

[16]  Jean-François Hren,et al.  Planification Optimiste pour Systèmes Déterministes , 2012 .

[17]  Lucian Busoniu,et al.  Optimistic planning for Markov decision processes , 2012, AISTATS.

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .