PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning

We present PolicyBlocks, an algorithm by which a reinforcement learning agent can extract useful macro-actions from a set of related tasks. The agent creates macroactions by finding commonalities in solutions to previous tasks. Using these macro-actions, learning to do future related tasks is accelerated. This increase in performance is illustrated in a “rooms” grid-world, in which the macro-actions found by PolicyBlocks outperform even hand designed macro-actions, and in a hydroelectric reservoir control task. We provide empirical comparisons of PolicyBlocks with the Reuse options of Bernstein (1999) and the SKILLS algorithm of Thrun and Schwartz (1995), which elucidate conditions under which each algorithm performs well.

[1]  John D. C. Little,et al.  The Use of Storage Water in a Hydroelectric System , 1955, Oper. Res..

[2]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .

[3]  H A Simon,et al.  The theory of learning by doing. , 1979, Psychological review.

[4]  C. Watkins Learning from delayed rewards , 1989 .

[5]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[6]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[7]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[8]  Michael H. Bowling,et al.  Reusing Learned Policies Between Similar Problems , 1998 .

[9]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[10]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[11]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[12]  Daniel S. Bernstein,et al.  Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[13]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[14]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[15]  Amy McGovern Autonomous Discovery of Abstractions through Interaction with an Environment , 2002, SARA.