Approximate Policy Iteration with a Policy Language Bias

We explore approximate policy iteration, replacing the usual cost-function learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.

[1]  J. Davenport Editor , 1960 .

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  J. Christen The airplane. , 1985, Occupational therapy in health care.

[4]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[5]  Balas K. Natarajan,et al.  On learning from exercises , 1989, COLT '89.

[6]  Steven Minton,et al.  Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[7]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[8]  Robert Givan,et al.  Taxonomic syntax for first order inference , 1989, JACM.

[9]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[10]  Steven Minton,et al.  Machine Learning Methods for Planning , 1994 .

[11]  Eugene Fink,et al.  Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[12]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[13]  Anders R. Kristensen,et al.  Dynamic programming and Markov decision processes , 1996 .

[14]  Tara A. Estlin,et al.  Multi-Strategy Learning of Search Control for Partial-Order Planning , 1996, AAAI/IAAI, Vol. 1.

[15]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16]  Craig Boutilier,et al.  Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[17]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[18]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[19]  Prasad Tadepalli,et al.  Learning Goal-Decomposition Rules Using Exercises , 1997, AAAI/IAAI.

[20]  Shaul Markovitch,et al.  A Selective Macro-learning Algorithm and its Application to the NxN Sliding-Tile Puzzle , 1998, J. Artif. Intell. Res..

[21]  Luc De Raedt,et al.  Relational Reinforcement Learning , 1998, ILP.

[22]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[23]  Bart Selman,et al.  Learning Declarative Control Rules for Constraint-BAsed Planning , 2000, ICML.

[24]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[25]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[26]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[27]  Craig A. Knoblock,et al.  Learning Plan Rewriting Rules , 2000, AIPS.

[28]  Gang Wu,et al.  Congestion control via online sampling , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[29]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.

[30]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[31]  Fahiem Bacchus,et al.  The AIPS '00 Planning Competition , 2001, The AI Magazine.

[32]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[33]  Xin Wang,et al.  Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[34]  Fahiem Bacchus,et al.  AIPS 2000 Planning Competition: The Fifth International Conference on Artificial Intelligence Planning and Scheduling Systems , 2001 .

[35]  Pedro Isasi Viñuela,et al.  Using genetic programming to learn and improve control knowledge , 2002, Artif. Intell..

[36]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[37]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[38]  Terry L. Zimmerman,et al.  Learning-Assisted Automated Planning: Looking Back, Taking Stock, Going Forward , 2003, AI Mag..

[39]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[40]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[41]  Håkan L. S. Younes Extending PDDL to Model Stochastic Decision Processes , 2003 .

[42]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[43]  Michail G. Lagoudakis,et al.  Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[44]  Andrew G. Barto,et al.  Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts , 2002, Machine Learning.

[45]  Benjamin Van Roy,et al.  Solitaire: Man Versus Machine , 2004, NIPS.

[46]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[47]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[48]  Roni Khardon,et al.  Learning to Take Actions , 1996, Machine Learning.

[49]  Sylvie Thiébaux,et al.  Exploiting First-Order Regression in Inductive Policy Selection , 2004, UAI.

[50]  Jörg Hoffmann,et al.  Ordered Landmarks in Planning , 2004, J. Artif. Intell. Res..

[51]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[52]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[53]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[54]  De,et al.  Relational Reinforcement Learning , 2022 .