论文信息 - Policy-Gradient Methods for Planning

Policy-Gradient Methods for Planning

Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning — in the form of a policy-gradient method — to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.

Douglas Aberdeen | D. Aberdeen | Douglas Aberdeen

[1] Mausam,et al. Concurrent Probabilistic Temporal Planning , 2005, ICAPS.

[2] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[3] Lin Zhang,et al. Decision-Theoretic Military Operations Planning , 2004, ICAPS.

[4] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[5] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.

[7] Håkan L. S. Younes,et al. Policy Generation for Continuous-time Stochastic Domains with Concurrency , 2004, ICAPS.

[8] Sylvie Thiébaux,et al. Prottle: A Probabilistic Temporal Planner , 2005, AAAI.

[9] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..