Planning with Robust (L)RTDP

Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming(RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case where transition probabilities are known to lie in a given interval.

[1]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[2]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[3]  D. Bertsekas,et al.  Stochastic Shortest Path Games , 1999 .

[4]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[5]  Rémi Munos Efficient Resources Allocation for Markov Decision Processes , 2001, NIPS.

[6]  Masanori Hosaka,et al.  Controlled Markov set-chains under average criteria , 2001, Appl. Math. Comput..

[7]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[8]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[9]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[10]  Håkan L. S. Younes,et al.  PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[11]  Frédérick Garcia,et al.  On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.

[12]  Michael L. Littman,et al.  An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[13]  Lin Zhang,et al.  Decision-Theoretic Military Operations Planning , 2004, ICAPS.