Decision Tree Methods for Finding Reusable MDP Homomorphisms

State abstraction is a useful tool for agents interacting with complex environments. Good state abstractions are compact, reuseable, and easy to learn from sample data. This paper combines and extends two existing classes of state abstraction methods to achieve these criteria. The first class of methods search for MDP homomorphisms (Ravindran 2004), which produce models of reward and transition probabilities in an abstract state space. The second class of methods, like the UTree algorithm (McCallum 1995), learn compact models of the value function quickly from sample data. Models based on MDP homomorphisms can easily be extended such that they are usable across tasks with similar reward functions. However, value based methods like UTree cannot be extended in this fashion. We present results showing a new, combined algorithm that fulfills all three criteria: the resulting models are compact, can be learned quickly from sample data, and can be used across a class of reward functions.

[1]  D. Ballard,et al.  Learning to Perceive and Act by Trial and Error , 2005, Machine Learning.

[2]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[3]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[6]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[7]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[8]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[9]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[10]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[11]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[12]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[13]  Luc De Raedt,et al.  Logical Markov Decision Programs , 2003 .

[14]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[15]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[16]  Robert Givan,et al.  Feature-Discovering Approximate Value Iteration Methods , 2005, SARA.

[17]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[18]  Justus H. Piater,et al.  Interactive learning of mappings from visual percepts to actions , 2005, ICML.

[19]  Andrew G. Barto,et al.  A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.