Learning Exploration Policies with Models
暂无分享,去创建一个
Reinforcement learning can greatly profit from world models updated
by experience and used for computing policies. Fast discovery of near
optimal policies however requires to focus on "useful" experiences. Using
an additional exploration model
we learn an exploration policy maximiz
ing "exploration rewards" for visits of states that promise information
gain. We augment this approach by an extension of Kaelbling's Interval
Estimation algorithm to the model based case. Experimental results in
stochastic environments demonstrate advantages of this hybrid approach.