Discriminative Learning of Beam-Search Heuristics for Planning

We consider the problem of learning heuristics for controlling forward state-space beam search in AI planning domains. We draw on a recent framework for "structured output classification" (e.g. syntactic parsing) known as learning as search optimization (LaSO). The LaSO approach uses discriminative learning to optimize heuristic functions for search-based computation of structured outputs and has shown promising results in a number of domains. However, the search problems that arise in AI planning tend to be qualitatively very different from those considered in structured classification, which raises a number of potential difficulties in directly applying LaSO to planning. In this paper, we discuss these issues and describe a LaSO-based approach for discriminative learning of beam-search heuristics in AI planning domains. We give convergence results for this approach and present experiments in several benchmark domains. The results show that the discriminatively trained heuristic can outperform the one used by the planner FF and another recent non-discriminative learning approach.

[1]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[2]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[3]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[4]  Blai Bonet,et al.  Planning as Heuristic Search: New Results , 1999, ECP.

[5]  Andrew W. Moore,et al.  Learning Evaluation Functions to Improve Optimization by Local Search , 2001, J. Mach. Learn. Res..

[6]  Craig A. Knoblock,et al.  Learning Plan Rewriting Rules , 2000, AIPS.

[7]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[8]  John K. Slaney,et al.  Blocks World revisited , 2001, Artif. Intell..

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[11]  Subbarao Kambhampati,et al.  Planning graph as the basis for deriving heuristics for plan synthesis by state space and CSP search , 2002, Artif. Intell..

[12]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[13]  Robert Givan,et al.  Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.

[14]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[15]  Robert Givan,et al.  Learning Measures of Progress for Planning Domains , 2005, AAAI.

[16]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[17]  S. Yoon Discrepancy Search with Reactive Policies for Planning , 2006 .