论文信息 - The Complexity of Markov Decision Processes

The Complexity of Markov Decision Processes

We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable in polynomial time by dynamic programming finite horizon problems, linear programming, or successive approximation techniques infinite horizon. We show that they are complete for P, and therefore most likely cannot be solved by highly parallel algorithms. We also show that, in contrast, the deterministic cases of all three problems can be solved very fast in parallel. The version with partially observed states is shown to be PSPACE-complete, and thus even less likely to be solved in polynomial time than the NP-complete problems; in fact, we show that, most likely, it is not possible to have an efficient on-line implementation involving polynomial time on-line computations and memory of an optimal policy, even if an arbitrary amount of precomputation is allowed. Finally, the variant of the problem in which there are no observations is shown to be NP-complete.

John N. Tsitsiklis | Christos H. Papadimitriou | J. Tsitsiklis | C. Papadimitriou

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] Walter J. Savitch,et al. Relationships Between Nondeterministic and Deterministic Tape Complexities , 1970, J. Comput. Syst. Sci..

[3] Albert R. Meyer,et al. Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[4] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5] Alfred V. Aho,et al. The Design and Analysis of Computer Algorithms , 1974 .

[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[7] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8] Richard J. Lipton,et al. Linear Programming is Log-Space Hard for P , 1979, Inf. Process. Lett..

[9] James B. Orlin,et al. The complexity of dynamic languages and dynamic optimization problems , 1981, STOC '81.

[10] John Staples,et al. The Maximum Flow Problem is Log Space Complete for P , 1982, Theor. Comput. Sci..

[11] Uzi Vishkin,et al. An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[12] Christos H. Papadimitriou,et al. Games against nature , 1985, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).