Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations

Partially-observable Markov decision processes provide a general model for decision theoretic planning problems, allowing trade-offs between various courses of actions to be determined under conditions of uncertainty, and incorporating partial observations made by an agent. Dynamic programming algorithms based on the belief state of an agent can be used to construct optimal policies without explicit consideration of past history, but at high computational cost. In this paper, we discuss how structured representations of system dynamics can be incorporated in classic POMDP solution algorithms. We use Bayesian networks with structured conditional probability matrices to represent POMDPs, and use this model to structure the belief space for POMDP algorithms, allowing irrelevant distinctions to be ignored. Apart from speeding up optimal policy construction, we suggest that such representations can be exploited in the development of useful approximation methods.

[1]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[3]  P. Caines Linear Stochastic Systems , 1988 .

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[6]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[7]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[8]  James E. Smith,et al.  Structuring Conditional Relationships in Influence Diagrams , 1993, Oper. Res..

[9]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[10]  David Poole,et al.  Probabilistic Horn Abduction and Bayesian Networks , 1993, Artif. Intell..

[11]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[12]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[13]  Anthony R. Cassandra,et al.  Optimal Policies for Partially Observable Markov Decision Processes , 1994 .

[14]  Keiji Kanazawa,et al.  A Decision-Theoretic Abductive Basis for Planning* , 1994 .

[15]  David Poole,et al.  Exploiting the Rule Structure for Decision Making within the Independent Choice Logic , 1995, UAI.

[16]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[17]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[18]  Craig Boutilier,et al.  Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[19]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .