论文信息 - Nonapproximability Results for Partially Observable Markov Decision Processes - 字舞流文

Nonapproximability Results for Partially Observable Markov Decision Processes

We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here "unlikely" means "unless some complexity classes collapse," where the collapses considered are P = NP, P = PSPACE, or P = EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation.

Judy Goldsmith | Martin Mundhenk | Christopher Lusena | Christopher Lusena | J. Goldsmith | M. Mundhenk

[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[2] Loren K. Platzman,et al. Finite memory estimation and control of finite probabilistic systems , 1977 .

[3] J. Tsitsiklis,et al. Intractable problems in control theory , 1986 .

[4] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[5] Chelsea C. White,et al. A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[6] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[7] Adi Shamir,et al. IP = PSPACE , 1992, JACM.

[8] Christos H. Papadimitriou,et al. Computational complexity , 1993 .

[9] Jens Palsberg,et al. Complexity Results for 1-safe Nets , 1993, FSTTCS.

[10] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[11] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .

[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13] H. James Hoover,et al. Limits to Parallel Computation: P-Completeness Theory , 1995 .

[14] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[15] H. James Hoover,et al. Limits to parallel computation , 1995 .

[16] Leslie Pack Kaelbling,et al. On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[17] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.

[18] M. Littman,et al. Efficient dynamic-programming updates in partially observable Markov decision processes , 1995 .

[19] Michel de Rougemont,et al. On the Complexity of Partially Observed Markov Decision Processes , 1996, Theor. Comput. Sci..

[20] Eric Allender,et al. The Complexity of Policy Evaluation for Finite-Horizon Partially-Observable Markov Decision Processes , 1997, MFCS.

[21] Milos Hauskrecht,et al. Planning and control in stochastic domains with imperfect information , 1997 .

[22] Michael L. Littman,et al. Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[23] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.

[24] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[25] Shlomo Zilberstein,et al. Finite-memory control of partially observable systems , 1998 .

[26] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[27] Tong Li,et al. My Brain is Full: When More Memory Helps , 1999, UAI.

[28] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[29] Weihong Zhang,et al. A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes , 1999, UAI.

[30] Jim Blythe,et al. Decision-Theoretic Planning , 1999, AI Mag..

[31] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.

[32] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[33] MARTIN MUNDHENK. The Complexity of Optimal Small Policies , 1999, Math. Oper. Res..

[34] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[35] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.

[36] M. Mundhenk. The complexity of planning with partially-observable Markov decision processes , 2000 .

[37] Eric Allender,et al. Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[38] Kee-Eung Kim,et al. Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.

[39] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.