Model Minimization in Hierarchical Reinforcement Learning

When applied to real world problems Markov Decision Processes (MDPs) often exhibit considerable implicit redundancy, especially when there are symmetries in the problem. In this article we present an MDP minimization framework based on homomorphisms. The framework exploits redundancy and symmetry to derive smaller equivalent models of the problem. We then apply our minimization ideas to the options framework to derive relativized options--options defined without an absolute frame of reference. We demonstrate their utility empirically even in cases where the minimization criteria are not met exactly.

[1]  J. Hartmanis Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics) , 1966 .

[2]  J. Hartmanis,et al.  Algebraic Structure Theory Of Sequential Machines , 1966 .

[3]  J. Robert Jump,et al.  A Note on the Iterative Decomposition of Finite Automata , 1969, Inf. Control..

[4]  C. Watkins Learning from delayed rewards , 1989 .

[5]  J. Glover Symmetry Groups and Translation Invariant Representations of Markov Processes , 1991 .

[6]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[7]  David Lee,et al.  Online minimization of transition systems (extended abstract) , 1992, STOC '92.

[8]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[9]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[10]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[11]  A. Prasad Sistla,et al.  Symmetry and model checking , 1996, Formal Methods Syst. Des..

[12]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[16]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[17]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[18]  Balaraman Ravindran,et al.  Symmetries and Model Minimization in Markov Decision Processes , 2001 .

[19]  Tucker R. Balch,et al.  Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.

[20]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[21]  Glenn A. Iba,et al.  A Heuristic Approach to the Discovery of Macro-Operators , 1989, Machine Learning.