Learning control of finite Markov chains with an explicit trade-off between estimation and control

An efficient scheme is presented for a learning control problem of finite Markov chains with unknown dynamics, i.e. with unknown transition probabilities. The scheme is designed to optimize the asymptotic system performance and for easy application to models with relatively many states and decisions. In this scheme a control policy is determined each time through maximization of a simple performance criterion that explicitly incorporates a tradeoff between estimation of the unknown probabilities and control of the system. The policy determination can be easily performed even in the case of large-size models, since the maximizing operation can be greatly simplified by use of the policy-iteration method. It is proven that this scheme becomes epsilon -optimal as well as optimal by suitable choice of control parameter values in the sense that a relative frequency coefficient of making optimal decisions tends to the maximum. >

[1]  J. Spruce Riordon,et al.  An adaptive automaton controller for discrete-time markov processes , 1969, Autom..

[2]  J. S. Riordon An adaptive automaton controller for discrete-time Markov processes , 1969 .

[3]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[4]  J. Alster,et al.  A technique for dual adaptive control , 1974, Autom..

[5]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[6]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[7]  I. Witten The apparent conflict between estimation and control—a survey of the two-armed bandit problem , 1976 .

[8]  V. Borkar,et al.  Adaptive control of Markov chains, I: Finite parameter set , 1979, 1979 18th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[9]  B. Doshi,et al.  Strong consistency of a modified maximum likelihood estimator for controlled Markov chains , 1980 .

[10]  Y. M. El-Fattah,et al.  Recursive Algorithms for Adaptive Control of Finite Markov Chains , 1981, IEEE Trans. Syst. Man Cybern..

[11]  P. Kumar,et al.  Optimal adaptive controllers for unknown Markov chains , 1982 .

[12]  P. Kumar,et al.  A new family of optimal adaptive controllers for Markov chains , 1982 .

[13]  Mitsuo Sato,et al.  Learning control of finite Markov chains with unknown transition probabilities , 1982 .

[14]  Mitsuo Sato,et al.  An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities , 1985 .