A new approach to the design of reinforcement schemes for learning automata

A new class of reinforcement schemes for learning automata that makes use of estimates of the random characteristics of the environment is introduced. Both a single automaton and a hierarchy of learning automata are considered. It is shown that under small values for the parameters, these algorithms converge in probability to the optimal choice of actions. By simulation it is observed that, for both cases, these algorithms converge quite rapidly. Finally, the generality of this method of designing learning schemes is pointed out, and it is shown that a very minor modification will enable the algorithm to learn in a multiteacher environment as well.

[1]  M. L. Tsetlin On the Behavior of Finite Automata in Random Media , 1961 .

[2]  S. Lakshmivarahan,et al.  Absolutely Expedient Learning Algorithms For Stochastic Automata , 1973 .

[3]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[4]  S. Lakshmivarahan,et al.  Absolute Expediency of Q-and S-Model Learning Algorithms , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[5]  Ix,et al.  A Class of Optimal Performance Directed Probabilistic , 1976 .

[6]  Daniel E. Koditschek,et al.  Fixed Structure Automata in a Multi-Teacher Environment , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  S. Lakshmivarahan,et al.  Learning Algorithms Theory and Applications , 1981 .

[8]  K. R. Ramakrishnan,et al.  An Automaton Model of a Hierarchical Learning System , 1981 .

[9]  M. Thathachar,et al.  A Hierarchical System of Learning Automata , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  K. R. Ramakrishnan,et al.  Hierarchical Systems and Cooperative Games of Learning Automata , 1982 .

[11]  Kumpati S. Narendra,et al.  The use of learning algorithms in telephone traffic routing - A methodology , 1983, Autom..

[12]  Norio Baba The absolutely expedient nonlinear reinforcement schemes under the unknown multiteacher environment , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  P. S. Sastry,et al.  A Class of Rapidly Converging Algorithms for Learning Automata , 1984 .

[14]  M. Thathachar,et al.  Asymptotic behaviour of a learning algorithm , 1984 .