Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, sufficient collection of portable skills.

[1]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[2]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[3]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[4]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[5]  Jean-Arcady Meyer,et al.  Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[9]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[10]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[13]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[14]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Bram Bakker,et al.  Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[17]  Andrew G. Barto,et al.  Causal Graph Based Decomposition of Factored MDPs , 2006, J. Mach. Learn. Res..

[18]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[19]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[20]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[21]  Michalis Vazirgiannis,et al.  NPClu: An approach for clustering spatially extended objects , 2008, Intell. Data Anal..

[22]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[23]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[24]  Larry D. Pyeatt,et al.  Reinforcement Learning for Closed-Loop Propofol Anesthesia: A Human Volunteer Study , 2010, IAAI.

[25]  Shimon Whiteson,et al.  Multi-task evolutionary shaping without pre-specified representations , 2010, GECCO '10.

[26]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.