论文信息 - Ergodic MDPs Admit Self-Optimising Policies

Ergodic MDPs Admit Self-Optimising Policies

Markov decision processes (MDPs) are an important class of dynamic systems with many applications. Intuitively it seems clear that if an MDP is ergodic then it should admit self-optimising policies. This is because ergodicity ensures that an MDP’s statetransition space can be freely explored which should allow a sufficiently accurate model of the MDP to be constructed. In this paper we prove that this intuition is indeed correct, though the full analysis is surprisingly complex.

Shane Legg | Marcus Hutter

[1] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[2] Shane Legg,et al. A Taxonomy for Abstract Environments. , 2004 .

[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .

[6] Marcus Hutter. Optimal Sequential Decisions based on Algorithmic Probability , 2003, ArXiv.

[7] J. Doob. Stochastic processes , 1953 .