Nonapproximability Results for Partially Observable Markov Decision Processes

We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here "unlikely" means "unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation.

[1]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[2]  Loren K. Platzman,et al.  Finite memory estimation and control of finite probabilistic systems , 1977 .

[3]  J. Tsitsiklis,et al.  Intractable problems in control theory , 1986 .

[4]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[5]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[6]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[7]  Adi Shamir,et al.  IP = PSPACE , 1992, JACM.

[8]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[9]  Jens Palsberg,et al.  Complexity Results for 1-safe Nets , 1993, FSTTCS.

[10]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[11]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  H. James Hoover,et al.  Limits to Parallel Computation: P-Completeness Theory , 1995 .

[14]  Benjamin Van Roy,et al.  Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[15]  H. James Hoover,et al.  Limits to parallel computation , 1995 .

[16]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[17]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[18]  M. Littman,et al.  Efficient dynamic-programming updates in partially observable Markov decision processes , 1995 .

[19]  Michel de Rougemont,et al.  On the Complexity of Partially Observed Markov Decision Processes , 1996, Theor. Comput. Sci..

[20]  Eric Allender,et al.  The Complexity of Policy Evaluation for Finite-Horizon Partially-Observable Markov Decision Processes , 1997, MFCS.

[21]  Milos Hauskrecht,et al.  Planning and control in stochastic domains with imperfect information , 1997 .

[22]  Michael L. Littman,et al.  Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[23]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[24]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[25]  Shlomo Zilberstein,et al.  Finite-memory control of partially observable systems , 1998 .

[26]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[27]  Tong Li,et al.  My Brain is Full: When More Memory Helps , 1999, UAI.

[28]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[29]  Weihong Zhang,et al.  A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes , 1999, UAI.

[30]  Jim Blythe,et al.  Decision-Theoretic Planning , 1999, AI Mag..

[31]  Leslie Pack Kaelbling,et al.  Learning Policies with External Memory , 1999, ICML.

[32]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[33]  MARTIN MUNDHENK The Complexity of Optimal Small Policies , 1999, Math. Oper. Res..

[34]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[35]  Daphne Koller,et al.  Policy Iteration for Factored MDPs , 2000, UAI.

[36]  M. Mundhenk The complexity of planning with partially-observable Markov decision processes , 2000 .

[37]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[38]  Kee-Eung Kim,et al.  Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.

[39]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.