Decentralized learning in finite Markov chains

The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewards. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains.

[1]  D. White Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[2]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[3]  Kumpati S. Narendra,et al.  Learning Models for Decentralized Decision Making , 1985, 1985 American Control Conference.

[4]  S. Marcus,et al.  Decentralized control of finite state Markov processes , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[5]  Norio Baba,et al.  On the Learning Behavior of Stochastic Automata Under a Nonstationary Random Environment , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  K. Narendra,et al.  Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information: A Unified Approach , 1982 .

[7]  B. Chandrasekaran,et al.  On Expediency and Convergence in Variable-Structure Automata , 1968, IEEE Trans. Syst. Sci. Cybern..

[8]  K. Narendra Competitive and Cooperative Games of Variable-Structure Stochastic Automata , 1973 .

[9]  S. Lakshmivarahan,et al.  Absolute Expediency of Q-and S-Model Learning Algorithms , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  P. Kumar,et al.  Optimal adaptive controllers for unknown Markov chains , 1982 .

[11]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Bellman A Markovian Decision Process , 1957 .

[13]  S. Lakshmivarahan,et al.  Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information , 1981, Math. Oper. Res..

[14]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[15]  P. Varaiya Optimal and suboptimal stationary controls for Markov chains , 1978 .

[16]  Mitsuo Sato,et al.  Learning control of finite Markov chains with unknown transition probabilities , 1982 .

[17]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[18]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .