Connectionist Learning Procedures

Abstract A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradient-descent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.

[1]  H B Barlow,et al.  Single units and sensation: a neuron doctrine for perceptual psychology? , 1972, Perception.

[2]  Terrence J. Sejnowski,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cognitive Sciences.

[3]  D. E. Rumelhart,et al.  Learning internal representations by back-propagating errors , 1986 .

[4]  J. J. Hopfield,et al.  ‘Unlearning’ has a stabilizing effect in collective memories , 1983, Nature.

[5]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[6]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[7]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[8]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  D J Willshaw,et al.  A marker induction mechanism for the establishment of ordered neural mappings: its application to the retinotectal problem. , 1979, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Eric Saund Abstraction and Representation of Continuous Variables in Connectionist Networks , 1986, AAAI.

[12]  Mandayam A. L. Thathachar,et al.  Learning Optimal Discriminant Functions through a Cooperative Game of Automata , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  R. Brady Optimization strategies gleaned from biological evolution , 1985, Nature.

[14]  A. J. Nijman,et al.  PARLE Parallel Architectures and Languages Europe , 1987, Lecture Notes in Computer Science.

[15]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[16]  Demetri Terzopoulos,et al.  Multiresolution computation of visible-surface representations , 1984 .

[17]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[18]  Barak A. Pearlmutter,et al.  G-maximization: An unsupervised learning procedure for discovering regularities , 1987 .

[19]  G M Edelman,et al.  Selective networks capable of representative transformations, limited generalizations, and associative memory. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[21]  B A Huberman,et al.  Understanding biological computation: reliable learning and recognition. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[23]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[24]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[25]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[26]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  Shun-ichi Amari,et al.  Field theory of self-organizing neural nets , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[29]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  Ralph Linsker,et al.  Computer simulation in brain science: Development of feature-analyzing cells and their columnar organization in a layered self-adaptive network , 1988 .

[32]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Geoffrey E. Hinton,et al.  Parallel Models of Associative Memory , 1989 .

[34]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[35]  Dana H. Ballard,et al.  Cortical connections and parallel processing: Structure and function , 1986, Behavioral and Brain Sciences.

[36]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[37]  Steven W. Zucker,et al.  On the Foundations of Relaxation Labeling Processes , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  R Linsker,et al.  From basic network principles to neural architecture: emergence of spatial-opponent cells. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[39]  J. S. Judd,et al.  Complexity of Connectionist Learning with Various Node Functions , 1987 .

[40]  Carsten Peterson,et al.  A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[41]  Francis Crick,et al.  The function of dream sleep , 1983, Nature.

[42]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[43]  Geoffrey E. Hinton Learning distributed representations of concepts. , 1989 .

[44]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[45]  José L. Marroquín,et al.  Probabilistic solution of inverse problems , 1985 .

[46]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[47]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[48]  Geoffrey E. Hinton,et al.  Parallel visual computation , 1983, Nature.

[49]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[50]  Gerald Tesauro,et al.  Scaling Relationships in Back-Propagation Learning: Dependence on Training Set Size , 1987, Complex Syst..

[51]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[52]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[53]  S. Grossberg How does a brain build a cognitive code , 1980 .

[54]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[55]  Y. L. Cun,et al.  Modèles connexionnistes de l'apprentissage , 1987 .

[56]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[57]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[58]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[59]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[60]  Geoffrey E. Hinton,et al.  How Learning Can Guide Evolution , 1996, Complex Syst..

[61]  R Linsker,et al.  From basic network principles to neural architecture: emergence of orientation columns. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[62]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[63]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[64]  E. Bienenstock,et al.  Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[65]  R Linsker,et al.  From basic network principles to neural architecture: emergence of orientation-selective cells. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Jerome A. Feldman,et al.  Neural Representation of Conceptual Knowledge. , 1986 .

[67]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[69]  Aviv Bergman,et al.  Computer simulation in brain science: The evolution of data processing abilities in competing automata , 1988 .

[70]  Marvin Minsky,et al.  Plain Talk about Neurodevelopmental Epistemology , 1977, IJCAI.

[71]  S. Grossberg,et al.  How does a brain build a cognitive code? , 1980, Psychological review.

[72]  Roman Bek,et al.  Discourse on one way in which a quantum-mechanics language on the classical logical base can be built up , 1978, Kybernetika.

[73]  Gérard Weisbuch,et al.  Scaling laws for the attractors of Hopfield networks , 1985 .

[74]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[75]  D. Ackley Stochastic iterated genetic hillclimbing , 1987 .

[76]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Gerald Tesauro,et al.  Scaling Relationships in Back-propagation Learning , 1988, Complex Syst..

[78]  Kunihiko Fukushima,et al.  Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..