Punish/Reward: Learning with a Critic in Adaptive Threshold Systems

An adaptive threshold element is able to "learn" a strategy of play for the game blackjack (twenty-one) with a performance close to that of the Thorp optimal strategy although the adaptive system has no prior knowledge of the game and of the objective of play. After each winning game the decisions of the adaptive system are "rewarded." After each losing game the decisions are "punished." Reward is accomplished by adapting while accepting the actual decision as the desired response. Punishment is accomplished by adapting while taking the desired response to be the opposite of that of the actual decision. This learning scheme is unlike "learning with a teacher" and unlike "unsupervised learning." It involves "bootstrap adaptation" or "learning with a critic." The critic rewards decisions which are members of successful chains of decisions and punishes other decisions. A general analytical model for learning with a critic is formulated and analyzed. The model represents bootstrap learning per se. Although the hypotheses on which the model is based do not perfectly fit blackjack learning, it is applied heuristically to predict adaptation rates with good experimental success. New applications are being explored for bootstrap learning in adaptive controls and multilayered adaptive systems.

[1]  Edward O. Thorp,et al.  Beat the Dealer: A Winning Strategy for the Game of Twenty-One , 1965 .

[2]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[3]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[4]  O. Firschein,et al.  Automatic Subclass Determination for pattern-Recognition Applications , 1963 .

[5]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[6]  David B. Cooper,et al.  Adaptive Pattern Recognition and Signal Detection Using Stochastic Approximation , 1964, IEEE Trans. Electron. Comput..

[7]  C. Hugh Mays,et al.  Effects of Adaptation Parameters on Convergence Time and Tolerance for Adaptive Threshold Elements , 1964, IEEE Trans. Electron. Comput..

[8]  James E. Dammann,et al.  A Technique for Determining and Coding Subclasses in Pattern Recognition Problems , 1965, IBM J. Res. Dev..

[9]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[10]  King-Sun Fu,et al.  An algorithm for learning without external supervision and its application to learning control systems , 1966 .

[11]  F. Smith A trainable nonlinear function generator , 1966 .

[12]  John C. Hancock,et al.  Nonsupervised sequential classification and recognition of patterns , 1966, IEEE Trans. Inf. Theory.

[13]  George Nagy,et al.  Self-corrective character recognition system , 1966, IEEE Trans. Inf. Theory.

[14]  John D. Spragins,et al.  Learning without a teacher , 1966, IEEE Trans. Inf. Theory.

[15]  J. S. Koford,et al.  The use of an adaptive threshold element to design a linear optimal pattern classifier , 1966, IEEE Trans. Inf. Theory.

[16]  F. Smith Design of quasi-optimal minimum-time controllers , 1966 .

[17]  Stanley C. Fralick,et al.  Learning to recognize patterns without a teacher , 1967, IEEE Trans. Inf. Theory.

[18]  J. Nagumo,et al.  A learning method for system identification , 1967, IEEE Transactions on Automatic Control.

[19]  H. R. Rudin,et al.  An automatic equalizer for general-purpose communication channels , 1967 .

[20]  B. Widrow,et al.  Adaptive antenna systems , 1967 .

[21]  George Nagy,et al.  State of the art in pattern recognition , 1968 .

[22]  Yu-Chi Ho,et al.  On pattern classification algorithms introduction and survey , 1968 .

[23]  Ya Tsypkin,et al.  Self-learning--What is it? , 1968 .

[24]  Demetrios G. Lainiotis,et al.  Unsupervised Learning Minimum Risk Pattern Classification for Dependent Hypotheses and Dependent Measurements , 1969, IEEE Trans. Syst. Sci. Cybern..

[25]  Jerry M. Mendel,et al.  Adaptive, learning, and pattern recognition systems : theory and applications , 1970 .

[26]  Lee R. Talbert The Sum-Line Extrapolative Algorithm and Its Application to Statistical Classification Problems , 1970, IEEE Trans. Syst. Sci. Cybern..

[27]  Iwao Morishita,et al.  Analysis of an Adaptive Threshold Logic Unit , 1970, IEEE Transactions on Computers.

[28]  David B. Cooper,et al.  On the Asymptotic Improvement in the Out- come of Supervised Learning Provided by Additional Nonsupervised Learning , 1970, IEEE Transactions on Computers.

[29]  A. Lender,et al.  Decision-Directed Digital Adaptive Equalization Technique for High-Speed Data Transmission , 1970 .