论文信息 - Reinforcement Learning Using Approximate Belief States

Reinforcement Learning Using Approximate Belief States

The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging areas of research in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probability distributions over the underlying model states. This is a promising method for small problems, but its application is limited by the intractability of computing or representing a full belief state for large problems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables. In this paper, we investigate two methods of full belief state reinforcement learning and one novel method for reinforcement learning using factored approximate belief states. We compare the performance of these algorithms on several well-known problem from the literature. Our results demonstrate the importance of approximate belief state representations for large problems.

[1] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .

[2] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[4] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[5] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[6] Corso Elvezia. Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning , 1996 .

[7] Marco Wiering,et al. HQ-Learning: Discovering Markovian Subgoals for Non-Markovian Reinforcement Learning , 1996 .

[8] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[9] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[10] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[11] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[12] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..