论文信息 - Approximate Policy Iteration with a Policy Language Bias

Approximate Policy Iteration with a Policy Language Bias

We explore approximate policy iteration, replacing the usual cost-function learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.

[1] J. Davenport. Editor , 1960 .

[2] R. Bellman. Dynamic programming. , 1957, Science.

[3] J. Christen. The airplane. , 1985, Occupational therapy in health care.

[4] Oren Etzioni,et al. Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[5] Balas K. Natarajan,et al. On learning from exercises , 1989, COLT '89.

[6] Steven Minton,et al. Quantitative Results Concerning the Utility of Explanation-based Learning , 1988, Artif. Intell..

[7] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[8] Robert Givan,et al. Taxonomic syntax for first order inference , 1989, JACM.

[9] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.

[10] Steven Minton,et al. Machine Learning Methods for Planning , 1994 .

[11] Eugene Fink,et al. Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[12] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[13] Anders R. Kristensen,et al. Dynamic programming and Markov decision processes , 1996 .

[14] Tara A. Estlin,et al. Multi-Strategy Learning of Search Control for Partial-Order Planning , 1996, AAAI/IAAI, Vol. 1.

[15] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16] Craig Boutilier,et al. Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[17] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[18] Robert Givan,et al. Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[19] Prasad Tadepalli,et al. Learning Goal-Decomposition Rules Using Exercises , 1997, AAAI/IAAI.

[20] Shaul Markovitch,et al. A Selective Macro-learning Algorithm and its Application to the NxN Sliding-Tile Puzzle , 1998, J. Artif. Intell. Res..

[21] Luc De Raedt,et al. Relational Reinforcement Learning , 1998, ILP.

[22] Roni Khardon,et al. Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[23] Bart Selman,et al. Learning Declarative Control Rules for Constraint-BAsed Planning , 2000, ICML.

[24] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[25] Hector Geffner,et al. Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[26] Fahiem Bacchus,et al. Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[27] Craig A. Knoblock,et al. Learning Plan Rewriting Rules , 2000, AIPS.

[28] Gang Wu,et al. Congestion control via online sampling , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[29] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.

[30] Bernhard Nebel,et al. The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[31] Fahiem Bacchus,et al. The AIPS '00 Planning Competition , 2001, The AI Magazine.

[32] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[33] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[34] Fahiem Bacchus,et al. AIPS 2000 Planning Competition: The Fifth International Conference on Artificial Intelligence Planning and Scheduling Systems , 2001 .

[35] Pedro Isasi Viñuela,et al. Using genetic programming to learn and improve control knowledge , 2002, Artif. Intell..

[36] Robert Givan,et al. Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[37] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[38] Terry L. Zimmerman,et al. Learning-Assisted Automated Planning: Looking Back, Taking Stock, Going Forward , 2003, AI Mag..

[39] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[40] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..