Concurrent Probabilistic Temporal Planning with Policy-Gradients

We present an any-time concurrent probabilistic temporal planner that includes continuous and discrete uncertainties and metric functions. Our approach is a direct policy search that attempts to optimise a parameterised policy using gradient ascent. Low memory use, plus the use of function approximation methods, plus factorisation of the policy, allow us to scale to challenging domains. This Factored Policy Gradient (FPG) Planner also attempts to optimise both steps to goal and the probability of success. We compare the FPG planner to other planners on CPTP domains, and on simpler but better studied probabilistic non-temporal domains.

[1]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[2]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[3]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[4]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[5]  Lex Weaver,et al.  A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[6]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[7]  Peter L. Bartlett,et al.  Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[8]  Håkan L. S. Younes Extending PDDL to Model Stochastic Decision Processes , 2003 .

[9]  Håkan L. S. Younes,et al.  PPDDL 1 . 0 : An Extension to PDDL for Expressing Planning Domains with Probabilistic Effects , 2004 .

[10]  Håkan L. S. Younes,et al.  Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.

[11]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Lin Zhang,et al.  Decision-Theoretic Military Operations Planning , 2004, ICAPS.

[13]  Sylvie Thiébaux,et al.  Prottle: A Probabilistic Temporal Planner , 2005, AAAI.

[14]  Mausam,et al.  Concurrent Probabilistic Temporal Planning , 2005, ICAPS.

[15]  Douglas Aberdeen,et al.  Policy-Gradient Methods for Planning , 2005, NIPS.

[16]  Sylvie Thiébaux,et al.  Concurrent Probabilistic Planning in the Graphplan Framework , 2006, ICAPS.

[17]  Scott Sanner,et al.  Practical Linear Value-approximation Techniques for First-order MDPs , 2006, UAI.

[18]  Mausam,et al.  Probabilistic Temporal Planning with Uncertain Durations , 2006, AAAI.

[19]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[20]  Olivier Buffet,et al.  FF + FPG: Guiding a Policy-Gradient Planner , 2007, ICAPS.

[21]  Sylvie Thiébaux,et al.  Probabilistic planning vs replanning , 2007 .