论文信息 - Generalized Prioritized Sweeping

Generalized Prioritized Sweeping

Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent's limited computational resources to achieve a good estimate of the value of environment states. To choose effectively where to spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are likely to have the largest errors. In this paper, we introduce generalized prioritized sweeping, a principled method for generating such estimates in a representation-specific manner. This allows us to extend prioritized sweeping beyond an explicit, state-based representation to deal with compact representations that are necessary for dealing with large state spaces. We apply this method for generalized model approximators (such as Bayesian networks), and describe preliminary experiments that compare our approach with classical prioritized sweeping.

[1] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[3] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[4] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[5] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[6] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7] Scott Davies,et al. Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[8] Prasad Tadepalli,et al. Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function , 1996, ICML.

[9] Nir Friedman,et al. Sequential Update of Bayesian Network Structure , 1997, UAI.