Pattern-recognizing stochastic learning automata

A class of learning tasks is described that combines aspects of learning automation tasks and supervised learning pattern-classification tasks. These tasks are called associative reinforcement learning tasks. An algorithm is presented, called the associative reward-penalty, or AR-P algorithm for which a form of optimal performance is proved. This algorithm simultaneously generalizes a class of stochastic learning automata and a class of supervised learning pattern-classification methods related to the Robbins-Monro stochastic approximation procedure. The relevance of this hybrid algorithm is discussed with respect to the collective behaviour of learning automata and the behaviour of networks of pattern-classifying adaptive elements. Simulation results are presented that illustrate the associative reinforcement learning task and the performance of the AR-P algorithm as compared with that of several existing algorithms.

[1]  W. A. Clark,et al.  Simulation of self-organizing systems by digital computer , 1954, Trans. IRE Prof. Group Inf. Theory.

[2]  H Robbins,et al.  A SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Richard C. Atkinson,et al.  Stimulus Sampling Theory , 1967 .

[4]  F. Downton Stochastic Approximation , 1969, Nature.

[5]  D. Meeter Stochastic Approximation and Nonlinear Regression , 1969 .

[6]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[7]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[8]  Thomas M. Cover,et al.  The two-armed-bandit problem with time-invariant finite memory , 1970, IEEE Trans. Inf. Theory.

[9]  Ray A. Jarvis,et al.  Adaptive Global Search in a Time-Variant Environment Using a Probabilistic Automaton with Pattern Recognition Supervision , 1970, IEEE Trans. Syst. Sci. Cybern..

[10]  A. S. Harding Markovian decision processes , 1970 .

[11]  A. H. Klopf,et al.  Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .

[12]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  M. L. Tsetlin,et al.  Automaton theory and modeling of biological systems , 1973 .

[15]  M. Norman Markovian Learning Processes , 1974 .

[16]  M. Norman A Central Limit theorem for Markov Processes that Move by Small Steps , 1974 .

[17]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[18]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[19]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[20]  Paul Smolensky,et al.  Schema Selection and Stochastic Inference in Modular Environments , 1983, AAAI.

[21]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[22]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  A. Barto Simulation Experiments with Goal-Seeking Adaptive Elements. , 1984 .