论文信息 - Learning Skills in Reinforcement Learning Using Relative Novelty - 字舞流文

Learning Skills in Reinforcement Learning Using Relative Novelty

We present a method for automatically creating a set of useful temporally-extended actions, or skills, in reinforcement learning. Our method identifies states that allow the agent to transition to a different region of the state space—for example, a doorway between two rooms—and generates temporally-extended actions that efficiently take the agent to these states. In identifying such states we use the concept of relative novelty, a measure of how much short-term novelty a state introduces to the agent. The resulting algorithm is simple, has low computational complexity, and is shown to improve performance in a number of problems.

Andrew G. Barto | Özgür Simsek | A. Barto | Özgür Simsek

[1] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[2] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[3] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[4] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[5] Tapio Elomaa,et al. Machine Learning: ECML 2002 , 2002, Lecture Notes in Computer Science.

[6] David G. Stork,et al. Pattern Classification , 1973 .

[7] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[8] Peter Dayan,et al. Dopamine Bonuses , 2000, NIPS.

[9] Nuttapong Chentanez,et al. Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[10] R. W. White. Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[11] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[12] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[13] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[14] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[15] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[16] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17] Alicia P. Wolfe,et al. Identifying useful subgoals in reinforcement learning by local graph partitioning , 2005, ICML.

[18] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[19] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[20] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..