Probabilistic Temporal Planning

Planning research has explored the issues that arise when planning with concurrent and durative actions. Separately, planners that can cope with probabilistic effects have also been created. However, few attempts have been made to combine both probabilistic effects and concurrent durative actions into a single planner. The principal one of which we are aware was targeted at a specific domain. We present a unified framework for probabilistic temporal planning. This framework supports actions with different duration probabilistic outcomes, and does not restrict an action’s effects to its start and end. We have tailored a deterministic search algorithm specifically for the framework. It combines elements of both LRTDP and AO*, and uses a search space designed to reduce the impact of exponential growth. Most search algorithms can benefit from the use of heuristics. We show some ways of applying heuristics to probabilistic temporal planning. This includes a framework for applying heuristics based on the planning graph data structure. The Planning Domain Definition Language (PDDL) is considered to be the standard language for defining planning domains and their associated problems. We present an extension to PDDL that supports probabilistic temporal planning.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Blai Bonet,et al.  Planning and Control in Artificial Intelligence: A Unifying Perspective , 2001, Applied Intelligence.

[3]  Mausam,et al.  Solving Concurrent Markov Decision Processes , 2004, AAAI.

[4]  Milind Tambe,et al.  A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources , 2007, IJCAI.

[5]  David E. Smith,et al.  Incremental Contingency Planning , 2003 .

[6]  Fahiem Bacchus,et al.  Planning with Resources and Concurrency: A Forward Chaining Approach , 2001, IJCAI.

[7]  Peng Dai,et al.  Topological Value Iteration Algorithm for Markov Decision Processes , 2007, IJCAI.

[8]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[9]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Subbarao Kambhampati,et al.  Planning Graph-based Heuristics for Cost-sensitive Temporal Planning , 2002, AIPS.

[11]  Mausam,et al.  Probabilistic Temporal Planning with Uncertain Durations , 2006, AAAI.

[12]  Kevin D. Seppi,et al.  Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..

[13]  Daniel S. Weld Concurrent Probabilistic Temporal Planning : Initial Results , 2004 .

[14]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[15]  Michael L. Littman,et al.  Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[16]  Blai Bonet,et al.  Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs , 2006, ICAPS.

[17]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[18]  Lin Zhang,et al.  Decision-Theoretic Military Operations Planning , 2004, ICAPS.

[19]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[20]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[21]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[22]  Håkan L. S. Younes,et al.  Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.

[23]  Maria Fox,et al.  Exploiting a Graphplan Framework in Temporal Planning , 2003, ICAPS.

[24]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[25]  Ronen I. Brafman,et al.  Planning with Continuous Resources in Stochastic Domains , 2005, IJCAI.

[26]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[27]  David E. Smith,et al.  Temporal Planning with Mutual Exclusion Reasoning , 1999, IJCAI.

[28]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[29]  R. Sargent,et al.  Mission planning and target tracking for autonomous instrument placement , 2005, 2005 IEEE Aerospace Conference.

[30]  John L. Bresina,et al.  Just-In-Case Scheduling , 1994, AAAI.

[31]  Fahiem Bacchus,et al.  Planning for temporally extended goals , 1996, Annals of Mathematics and Artificial Intelligence.

[32]  John Langford,et al.  Probabilistic Planning in the Graphplan Framework , 1999, ECP.

[33]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[34]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[35]  Nicolas Meuleau,et al.  Scaling Up Decision Theoretic Planning to Planetary Rover Problems , 2004 .

[36]  Håkan L. S. Younes,et al.  PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[37]  Stefan Edelkamp,et al.  Automated Planning: Theory and Practice , 2007, Künstliche Intell..

[38]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[39]  Maria Fox,et al.  PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains , 2003, J. Artif. Intell. Res..

[40]  Blai Bonet,et al.  Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.