Robust LRTDP: Reachability Analysis

Stochastic Shortest Path problems (SSPs) can be efficiently dealt with by the Real-Time Dynamic Programmingalgorithm (RTDP). Yet, RTDP requires that a goal state is always reachable. This paper presents an algorithm checking for goal reachability, especially in the complex case of an uncertain SSP where only a possible interval is known for each transition probability. This gives an analysis method for determining if SSP algorithms such as RTDP are applicable, even if the exact model is not known. We aim at a symbolic analysis in order to avoid a complete state-space enumeration.

[1]  Randal E. Bryant,et al.  Symbolic Manipulation of Boolean Functions Using a Graphical Representation , 1985, 22nd ACM/IEEE Design Automation Conference.

[2]  Olivier Coudert,et al.  Verifying Temporal Properties of Sequential Machines without Building Their State Diagrams , 1990, CAV.

[3]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[4]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[5]  Rémi Munos Efficient Resources Allocation for Markov Decision Processes , 2001, NIPS.

[6]  Masanori Hosaka,et al.  Controlled Markov set-chains under average criteria , 2001, Appl. Math. Comput..

[7]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[8]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[9]  Frédérick Garcia,et al.  On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.

[10]  Michael L. Littman,et al.  An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[11]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Olivier Buffet,et al.  Robust Planning with (L)RTDP , 2005, IJCAI.

[14]  O. Buffet Planning with Robust (L)RTDP , 2005 .