Ergodic MDPs Admit Self-Optimising Policies

Markov decision processes (MDPs) are an important class of dynamic systems with many applications. Intuitively it seems clear that if an MDP is ergodic then it should admit self-optimising policies. This is because ergodicity ensures that an MDP’s statetransition space can be freely explored which should allow a sufficiently accurate model of the MDP to be constructed. In this paper we prove that this intuition is indeed correct, though the full analysis is surprisingly complex.