Value-Directed Belief State Approximation for POMDPs

We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for beliefstate approximation (e.g., based on minimizing a measure such as KL-divergence between the true and estimated state) are not necessarily appropriate for POMDPs. Instead we propose a framework for analyzing value-directed approximation schemes, where approximation quality is determined by the expected error in utility rather than by the error in the belief state itself. We propose heuristic methods for finding good projection schemes for belief state estimation--exhibiting anytime characteristics--given a POMDP value function. We also describe several algorithms for constructing bounds on the error in decision quality (expected utility) associated with acting in accordance with a given belief state approximation.

[1]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[2]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[3]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[4]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[5]  Daphne Koller,et al.  Using Learning for Approximation in Stochastic Processes , 1998, ICML.

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[8]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[9]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[10]  David A. McAllester,et al.  Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.

[11]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[12]  N. Zhang,et al.  Algorithms for partially observable markov decision processes , 2001 .

[13]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[14]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[15]  Uffe Kjærulff,et al.  A Computational Scheme for Reasoning in Dynamic Probabilistic Networks , 1992, UAI.

[16]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.