Modified Policy Iteration Algorithms for Discounted Markov Decision Problems

In this paper we study a class of modified policy iteration algorithms for solving Markov decision problems. These correspond to performing policy evaluation by successive approximations. We discuss the relationship of these algorithms to Newton-Kantorovich iteration and demonstrate their covergence. We show that all of these algorithms converge at least as quickly as successive approximations and obtain estimates of their rates of convergence. An analysis of the computational requirements of these algorithms suggests that they may be appropriate for solving problems with either large numbers of actions, large numbers of states, sparse transition matrices, or small discount rates. These algorithms are compared to policy iteration, successive approximations, and Gauss-Seidel methods on large randomly generated test problems.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[3]  J. E. Dennis A Stationary Newton Method for Nonlinear Functional Equations , 1967 .

[4]  W. Rheinboldt A unified convergence theory for a class of iterative processes. , 1968 .

[5]  N. A. J. Hastings,et al.  Optimization of Discounted Markov Decision Problems , 1969 .

[6]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[7]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[8]  N. Hastings,et al.  Tests for Suboptimal Actions in Discounted Markov Programming , 1973 .

[9]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[10]  Evan L. Porteus Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[11]  Jo van Nunen,et al.  A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..

[12]  J.A.E.E. van Nunen,et al.  The action elimination algorithm for Markov decision processes , 1976 .

[13]  J Jaap Wessels,et al.  The generation of successive approximation methods for Markov decision processes by using stopping times , 1976 .

[14]  Gary J. Koehler A Case for Relaxation Methods in Large Scale Linear Programming , 1976 .

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  D. White ELIMINATION OF NON-OPTIMAL ACTIONS IN MARKOV DECISION PROCESSES , 1978 .

[17]  Martin L. Puterman,et al.  THE ANALYTIC THEORY OF POLICY ITERATION , 1978 .

[18]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..