Adaptive control of Markov chains, I: Finite parameter set

Consider a controlled Markov chain whose transition probabilities depend upon an unknovn parameter ¿ taking values in finite set A. To each a is associated a prespecified stationary control law ¿(¿). The adaptive control lay selects at each time t the control action indicated by ¿(¿t) where ¿t is the maximum likelihood estimate of ¿. It is shown that ¿t converges to a parameter ¿* such that the 'closed loop transition probabilities corresponding to ¿* and ¿(¿*) are the same as those corresponding to ¿0 and ¿(¿*) where ¿0 is the true parameter. The situation vhen ¿0 does not belong to the model set A is briefly discussed.