Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees

This paper addresses the problem of solving discrete-time optimal sequential decision making problems having a disturbance space W composed of a finite number of elements. In this context, the problem of finding from an initial state x 0 an optimal decision strategy can be stated as an optimization problem which aims at finding an optimal combination of decisions attached to the nodes of a disturbance tree modeling all possible sequences of disturbances w 0 , w 1 , ..., $w_{T-1} \in W^T$ over the optimization horizon T . A significant drawback of this approach is that the resulting optimization problem has a search space which is the Cartesian product of O (|W | T *** 1) decision spaces U , which makes the approach computationally impractical as soon as the optimization horizon grows, even if W has just a handful of elements. To circumvent this difficulty, we propose to exploit an ensemble of randomly generated incomplete disturbance trees of controlled complexity, to solve their induced optimization problems in parallel, and to combine their predictions at time t = 0 to obtain a (near-)optimal first-stage decision. Because this approach postpones the determination of the decisions for subsequent stages until additional information about the realization of the uncertain process becomes available, we call it lazy . Simulations carried out on a robot corridor navigation problem show that even for small incomplete trees, this approach can lead to near-optimal decisions.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[3]  M. Dempster Sequential Importance Sampling Algorithms for Dynamic Stochastic Programming , 2006 .

[4]  Peter Kall,et al.  Stochastic Programming , 1995 .

[5]  Jan M. Maciejowski,et al.  Predictive control : with constraints , 2002 .

[6]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[9]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[10]  Manfred Morari,et al.  Robust constrained model predictive control using linear matrix inequalities , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[11]  John M. Wilson,et al.  Introduction to Stochastic Programming , 1998, J. Oper. Res. Soc..

[12]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[13]  Stein W. Wallace,et al.  Generating Scenario Trees for Multistage Decision Problems , 2001, Manag. Sci..

[14]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[17]  Ronald Hochreiter,et al.  Financial scenario generation for stochastic multi-stage decision processes as facility location problems , 2007, Ann. Oper. Res..

[18]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[19]  Werner Römisch,et al.  Stability of Multistage Stochastic Programs , 2006, SIAM J. Optim..

[20]  R. Wets,et al.  Stochastic programming , 1989 .

[21]  Yurii Nesterov,et al.  Confidence level solutions for stochastic programming , 2000, Autom..

[22]  John Launchbury,et al.  A natural semantics for lazy evaluation , 1993, POPL '93.

[23]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[24]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[25]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[26]  W. Römisch Stability of Stochastic Programming Problems , 2003 .

[27]  Svetlozar T. Rachev,et al.  Quantitative Stability in Stochastic Programming: The Method of Probability Metrics , 2002, Math. Oper. Res..

[28]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[29]  Louis Wehenkel,et al.  Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[31]  George L. Nemhauser,et al.  Handbooks in operations research and management science , 1989 .