Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

This paper presents a method by which a reinforcement learning agent can automatically discover certain types of subgoals online. By creating useful new subgoals while learning, the agent is able to accelerate learning on the current task and to transfer its expertise to other, related tasks through the reuse of its ability to attain subgoals. The agent discovers subgoals based on commonalities across multiple paths to a solution. We cast the task of finding these commonalities as a multiple-instance learning problem and use the concept of diverse density to find solutions. We illustrate this approach using several gridworld tasks.

[1]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .

[2]  H A Simon,et al.  The theory of learning by doing. , 1979, Psychological review.

[3]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[4]  Glenn A. Iba,et al.  A heuristic approach to the discovery of macro-operators , 2004, Machine Learning.

[5]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[6]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Bruce L. Digney Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcem , 1996 .

[9]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[10]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[11]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[12]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[13]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[14]  Amy McGovern,et al.  AcQuire-macros: An Algorithm for Automatically Learning Macro-actions , 1998 .

[15]  Chris Drummond,et al.  Composing Functions to Speed up Reinforcement Learning in a Changing World , 1998, ECML.

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Daniel S. Bernstein,et al.  Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[18]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.