A new family of optimal adaptive controllers for Markov chains

We consider the problem of adaptively controlling a Markov chain with unknown transition probabilities. A new family of adaptive controllers is exhibited which achieves a performance precisely equalling the optimal performance achievable if the transition probabilities (i.e., the model or dynamics of the system) were known instead. Hence, the adaptive controllers presented here are truly optimal The performance of the system to be controlled is measured by the average of the costs incurred over an infinite operating time period. These adaptive controllers can, potentially, be implemented on digital computers and used in the on-line control of unknown systems.