论文信息 - Discounted near-optimal control of general continuous-action nonlinear systems using optimistic planning

Discounted near-optimal control of general continuous-action nonlinear systems using optimistic planning

We propose an optimistic planning method to search for near-optimal sequences of actions in discrete-time, infinite-horizon optimal control problems with discounted rewards. The dynamics are general nonlinear, while the action (input) is scalar and compact. The method works by iteratively splitting the infinite-dimensional search space into hyperboxes. Under appropriate conditions on the dynamics and rewards, we analyze the shrinking rate of the range of possible values in each box. When coupled with a measure of problem complexity, this leads to an overall convergence rate of the algorithm to the infinite-horizon optimum, as a function of computation invested. We provide simulation results showing that the algorithm is useful in practice, and comparing it with two alternative planning methods.

Lucian Busoniu | Rémi Munos | Elod Páll

[1] Lucian Busoniu,et al. An analysis of optimistic, best-first search for minimax sequential decision making , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[2] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.

[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4] Lars Grne,et al. Nonlinear Model Predictive Control: Theory and Algorithms , 2011 .

[5] Michael L. Littman,et al. Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.

[6] Rémi Munos,et al. Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[7] Paulo Tabuada,et al. Dynamic programming formulation of periodic event-triggered control: Performance Guarantees and co-design , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[9] Dimitri P. Bertsekas,et al. Dynamic programming and optimal control, 3rd Edition , 2005 .

[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[11] Michael L. Littman,et al. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.

[12] Robert Babuska,et al. Optimistic planning for continuous-action deterministic systems , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[13] Steven M. LaValle,et al. Planning algorithms , 2006 .

[14] Bart De Schutter,et al. Optimistic planning with a limited number of action switches for near-optimal nonlinear control , 2014, 53rd IEEE Conference on Decision and Control.

[15] Konstantinos V. Katsikopoulos,et al. Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..

[16] Jean-François Hren,et al. Planification Optimiste pour Systèmes Déterministes , 2012 .

[17] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.

[18] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .