Hierarchical Memory-Based Reinforcement Learning
暂无分享,去创建一个
A key challenge for reinforcement learning is scaling up to large partially observable domains. In this paper, we show how a hierarchy of behaviors can be used to create and select among variable length short-term memories appropriate for a task. At higher levels in the hierarchy, the agent abstracts over lower-level details and looks back over a variable number of high-level decisions in time. We formalize this idea in a framework called Hierarchical Suffix Memory (HSM). HSM uses a memory-based SMDP learning method to rapidly propagate delayed reward across long decision sequences. We describe a detailed experimental study comparing memory vs. hierarchy using the HSM framework on a realistic corridor navigation task.
[1] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[2] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[3] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[4] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.