论文信息 - Reinforcing connectionism : learning the statistical way

Reinforcing connectionism : learning the statistical way

Connectionism's main contribution to cognitive science will prove to be the renewed impetus it has imparted to learning. Learning can be integrated into the existing theoretical foundations of the subject, and the combination, statistical computational theories, provide a framework within which many connectionist mathematical mechanisms naturally fit. Examples from supervised and reinforcement learning demonstrate this. Statistical computational theories already exist for certain associative matrix memories. This work is extended, allowing real valued synapses and arbitrarily biased inputs. It shows that a covariance learning rule optimises the signal/noise ratio, a measure of the potential quality of the memory, and quantifies the performance penalty incurred by other rules. In particular two that have been suggested as occurring naturally are shown to be asymptotically optimal in the limit of sparse coding. The mathematical model is justified in comparison with other treatments whose results differ. Reinforcement comparison is a way of hastening the learning of reinforcement learning systems in statistical environments. Previous theoretical analysis has not distinguished between different comparison terms, even though empirically, a covariance rule has been shown to be better than just a constant one. The workings of reinforcement comparison are investigated by a second order analysis of the expected statistical performance of learning, and an alternative rule is proposed and empirically justified. The existing proof that temporal difference prediction learning converges in the mean is extended from a special case involving adjacent time steps to the general case involving arbitrary ones. The interaction between the statistical mechanism of temporal difference and the linear representation is particularly stark. The performance of the method given a linearly dependent representation is also analysed. The method of planning using temporal difference prediction had previously been applied to solve the navigation task of finding a goal in a grid. This is extended to compare the qualities of alternative representations of the environment and to accomplish simple latent learning when the goal is initially absent. Representations that are coarse-coded are shown to perform particularly well, and latent learning can be used to form them.

P. Dayan

[1] H. Blodgett,et al. The effect of the introduction of reward upon the maze performance of rats , 1929 .

[2] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3] J. Gillis,et al. Matrix Iterative Analysis , 1961 .

[4] D. Marr. A theory of cerebellar cortex , 1969, The Journal of physiology.

[5] D. Marr. A theory for cerebral neocortex , 1970, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[6] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7] J. O'Keefe,et al. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.

[8] David Willshaw,et al. Models of distributed associative memory , 1971 .

[9] G. Stent. A physiological mechanism for Hebb's postulate of learning. , 1973, Proceedings of the National Academy of Sciences of the United States of America.

[10] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[12] Marvin Minsky,et al. A framework for representing knowledge , 1974 .

[13] D. Gaffan,et al. Recognition impaired and association intact in the memory of monkeys after transection of the fornix. , 1974, Journal of comparative and physiological psychology.

[14] William A. Woods,et al. What's in a Link: Foundations for Semantic Networks , 1975 .

[15] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[16] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[17] J. A. Fodor,et al. Tom Swift and his procedural grandmother , 1978, Cognition.

[18] John Haugeland. The nature and plausibility of Cognitivism , 1978, Behavioral and Brain Sciences.

[19] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .

[20] J. Fodor. Methodological solipsism considered as a research strategy in cognitive psychology , 1980, Behavioral and Brain Sciences.

[21] A. Dickinson. Contemporary Animal Learning Theory , 1981 .

[22] Hiroshi Takeda,et al. Learning Control of Finite Markov Chains , 1981 .

[23] R. Morris. Spatial Localization Does Not Require the Presence of Local Cues , 1981 .

[24] Richard S. Sutton,et al. Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .

[25] Masao Ito,et al. Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar Purkinje cells , 1982, The Journal of physiology.

[26] E. Bienenstock,et al. Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[27] L. Barsalou. Context-independent and context-dependent information in concepts , 1982, Memory & cognition.

[28] Stephen Grossberg,et al. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[29] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[30] R. Racine,et al. Long-term potentiation phenomena in the rat limbic forebrain , 1983, Brain Research.

[31] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[32] Zenon W. Pylyshyn,et al. Computation and Cognition: Toward a Foundation for Cognitive Science , 1984 .

[33] P. Smolensky,et al. Harmony Theory: Problem Solving, Parallel Cognitive Models, and Thermal Physics. , 1984 .

[34] Ronald J. Brachman,et al. "I Lied About the Trees", Or, Defaults and Definitions in Knowledge Representation , 1985, AI Mag..

[35] Bernard Widrow,et al. Adaptive Signal Processing , 1985 .

[36] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[37] Geoffrey E. Hinton,et al. Symbols Among the Neurons: Details of a Connectionist Inference Architecture , 1985, IJCAI.

[38] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[39] Christopher Cherniak,et al. Minimal Rationality , 1986, Computational models of cognition and perception.

[40] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[41] Lawrence D. Jackel,et al. Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[42] Hubert L. Dreyfus,et al. Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer , 1987, IEEE Expert.

[43] David S. Touretzky,et al. A distributed connectionist representation for concept structures , 1987 .

[44] Richard Durbin,et al. An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[45] David Chapman,et al. Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[46] M. Tsodyks,et al. The Enhanced Storage Capacity in Neural Networks with Low Activity Level , 1988 .

[47] H. Putnam. Representation and Reality , 1993 .

[48] J. Fodor,et al. Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[49] E. Gardner. The space of interactions in neural network models , 1988 .

[50] Roger C. Schank,et al. SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[51] P. Smolensky. On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[52] Günther Palm,et al. Local Synaptic Rules with Maximal Information Storage Capacity , 1988 .

[53] M. Ross Quillian,et al. 4 – Semantic Memory , 1988 .

[54] Pentti Kanerva,et al. Sparse Distributed Memory , 1988 .

[55] Richard A. Andersen,et al. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[56] H. White. Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[57] David Haussler,et al. What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[58] T. Sejnowski,et al. Associative long-term depression in the hippocampus induced by hebbian covariance , 1989, Nature.

[59] Richard Szeliski,et al. An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[60] A. Linden,et al. Inversion of multilayer nets , 1989, International 1989 Joint Conference on Neural Networks.

[61] C. Watkins. Learning from delayed rewards , 1989 .

[62] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[63] D. Amit,et al. Optimised network for sparsely coded patterns , 1989 .

[64] Michael C. Mozer,et al. Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[65] Drew McDermott,et al. A critique of pure reason 1 , 1987, The Philosophy of Artificial Intelligence.

[66] Andy Clark,et al. Connectionism, Competence, and Explanation , 1990, The British Journal for the Philosophy of Science.

[67] Geoffrey E. Hinton,et al. Mundane Reasoning by Parallel Constraint Satisfaction , 1990 .

[68] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.

[69] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[70] N. Chater,et al. Autonomy, implementation and cognitive architecture: A reply to Fodor and Pylyshyn , 1990, Cognition.

[71] Jean-Pierre Nadal,et al. Information storage in sparsely coded memory nets , 1990 .

[72] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[73] Peter Dayan,et al. Optimal Plasticity from Matrix Memories: What Goes Up Must Come Down , 1990, Neural Computation.

[74] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[75] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[76] D J Willshaw,et al. An assessment of Marr's theory of the hippocampus as a temporary memory store. , 1990, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[77] P. Smolensky,et al. Harmonic Grammar -- A Formal Multi-Level Connectionist Theory of Linguistic Well-Formedness: An Application ; CU-CS-464-90 , 1990 .

[78] W. Singer,et al. Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex , 1990, Nature.

[79] William A. Phillips,et al. A Biologically Supported Error-Correcting Learning Rule , 1991, Neural Computation.

[80] Ron Chrisley,et al. Cognitive Map Construction and Use: A Parallel Distributed Processing Approach , 1991 .

[81] V. Gullapalli. Modeling cortical area 7a using Stochastic Real-Valued (SRV) units , 1991 .

[82] Douglas A. Baxter. Book Review$39.00, 276 pp Connectionistic Problem Solving: Computational Aspects of Biological Learning, S.E. Hampson, Birkhauser (1990), ISBN: 0-8176-3450-9 , 1992 .