论文信息 - Pattern-recognizing stochastic learning automata

Pattern-recognizing stochastic learning automata

A class of learning tasks is described that combines aspects of learning automation tasks and supervised learning pattern-classification tasks. These tasks are called associative reinforcement learning tasks. An algorithm is presented, called the associative reward-penalty, or AR-P algorithm for which a form of optimal performance is proved. This algorithm simultaneously generalizes a class of stochastic learning automata and a class of supervised learning pattern-classification methods related to the Robbins-Monro stochastic approximation procedure. The relevance of this hybrid algorithm is discussed with respect to the collective behaviour of learning automata and the behaviour of networks of pattern-classifying adaptive elements. Simulation results are presented that illustrate the associative reinforcement learning task and the performance of the AR-P algorithm as compared with that of several existing algorithms.

P. Anandan | Andrew G. Barto | A. Barto | P. Anandan

[1] W. A. Clark,et al. Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.

[2] H Robbins,et al. A SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[3] Richard C. Atkinson,et al. Stimulus Sampling Theory , 1967 .

[4] F. Downton. Stochastic Approximation , 1969, Nature.

[5] D. Meeter. Stochastic Approximation and Nonlinear Regression , 1969 .

[6] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[7] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .

[8] Thomas M. Cover,et al. The two-armed-bandit problem with time-invariant finite memory , 1970, IEEE Trans. Inf. Theory.

[9] Ray A. Jarvis,et al. Adaptive Global Search in a Time-Variant Environment Using a Probabilistic Automaton with Pattern Recognition Supervision , 1970, IEEE Trans. Syst. Sci. Cybern..

[10] A. S. Harding. Markovian decision processes , 1970 .

[11] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[12] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[13] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14] M. L. Tsetlin,et al. Automaton theory and modeling of biological systems , 1973 .