On the Complexity of Partially Observed Markov Decision Processes

Abstract In the paper we consider the complexity of constructing optimal policies (strategies) for some type of partially observed Markov decision processes. This particular case of the classical problem deals with finite stationary processes, and can be represented as constructing optimal strategies to reach target vertices from a starting vertex in a graph with colored vertices and probabilistic deviations from an edge chosen to follow. The colors of the visited vertices is the only information available to a strategy. The complexity of Markov decision in the case of perfect information (bijective coloring of vertices) is known and briefly surveyed at the beginning of the paper. For the unobservable case (all the colors are equal) we give an improvement of the result of Papadimitriou and Tsitsiklis, namely we show that the problem of constructing even a very weak approximation to an optimal strategy is NP-hard. Our main results concern the case of a fixed bound on the multiplicity of coloring, that is a case of partially observed processes where some upper bound on the unobservability is supposed. We show that the problem of finding an optimal strategy is still NP-hard, but polytime approximations are possible. Some relations of our results to the Max-Word Problem are also indicated.

[1]  George L. Nemhauser,et al.  Handbooks in operations research and management science , 1989 .

[2]  D. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[4]  M. Puterman Chapter 8 Markov decision processes , 1990 .

[5]  Mihalis Yannakakis,et al.  Shortest Paths Without a Map , 1989, Theor. Comput. Sci..

[6]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[7]  J. Van Leeuwen,et al.  Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[10]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Anne Condon The Complexity of the Max Word Problem , 1991, STACS.

[13]  David S. Johnson,et al.  A Catalog of Complexity Classes , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[14]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[15]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[16]  Xiaotie Deng,et al.  How to learn an unknown environment , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[17]  Charles J. Colbourn,et al.  The Combinatorics of Network Reliability , 1987 .

[18]  Christos H. Papadimitriou,et al.  Games against nature , 1985, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[19]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[20]  Juan Francisco Díaz-Frías,et al.  A theory of robust planning , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.