论文信息 - A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

We describe a Reinforcement Learning algorithm for partially observable environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over-fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT converges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.

Akira Hayashi | Nobuo Suematsu | N. Suematsu | A. Hayashi

[1] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[2] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[3] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[4] Akira Hayashi,et al. A Bayesian Approach to Model Learning in Non-Markovian Environments , 1997, ICML.

[5] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[6] Dana Ron,et al. Learning probabilistic automata with variable memory length , 1994, COLT '94.

[7] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[8] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.