Learning control of finite Markov chains with unknown transition probabilities
暂无分享,去创建一个
For a Markovian decision problem in which the transition probabilities are unknown, two learning algorithms are devised from the viewpoint of asymptotic optimality. Each time the algorithms select decisions to be used on the basis of not only the estimates of the unknown probabilities but also uncertainty of them. It is shown that the algorithms are asymptotically optimal in the sense that the probability of selecting an optimal policy converges to unity.
[1] J. K. Satia,et al. Markovian Decision Processes with Uncertain Transition Probabilities , 1973, Oper. Res..
[2] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979 .