Autonomous reinforcement of behavioral sequences in neural dynamics

We introduce a dynamic neural algorithm called Dynamic Neural (DN) SARSA(λ) for learning a behavioral sequence from delayed reward. DN-SARSA(λ) combines Dynamic Field Theory models of behavioral sequence representation, classical reinforcement learning, and a computational neuroscience model of working memory, called Item and Order working memory, which serves as an eligibility trace. DN-SARSA(λ) is implemented on both a simulated and real robot that must learn a specific rewarding sequence of elementary behaviors from exploration. Results show DN-SARSA(λ) performs on the level of the discrete SARSA(λ), validating the feasibility of general reinforcement learning without compromising neural dynamics.

[1]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[2]  S. Amari Dynamics of pattern formation in lateral-inhibition type neural fields , 1977, Biological Cybernetics.

[3]  Stephen Grossberg,et al.  Neural dynamics of speech perception: Phonemic restoration in noise using subsequent context. , 2009 .

[4]  Gregor Schöner,et al.  A neural-dynamic architecture for behavioral organization of an embodied agent , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[5]  Gregor Sch,et al.  Dynamical Systems Approaches to Cognition , 2008 .

[6]  M. Kawato,et al.  Efficient reinforcement learning: computational theories, neuroscience and robotics , 2007, Current Opinion in Neurobiology.

[7]  Stephan K. U. Zibner,et al.  Scenes and tracking with dynamic neural fields: How to update a robotic scene representation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[8]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[9]  G. Schöner,et al.  Dynamic Field Theory of Movement Preparation , 2022 .

[10]  E. Thelen,et al.  Using dynamic field theory to rethink infant habituation. , 2006, Psychological review.

[11]  Gregor Schöner,et al.  A robotic architecture for action selection and behavioral organization inspired by human cognition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Christian Faubel,et al.  Learning to recognize objects on the fly: A neurally based dynamic field approach , 2008, Neural Networks.

[13]  Jonathan D. Cohen,et al.  Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement , 2008, NIPS.

[14]  Gregor Schöner,et al.  Saccadic motor planning by integrating visual information and pre-information on neural dynamic fields , 1995, Biological Cybernetics.

[15]  Gregor Schöner,et al.  A Dynamic Neural Field Theory of Multi-Item Visual Working Memory and Change Detection , 2006 .

[16]  T. Sejnowski,et al.  A Computational Model of How the Basal Ganglia Produce Sequences , 1998, Journal of Cognitive Neuroscience.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Stephex GROSSBERGl Behavioral Contrast in Short Term Memory : Serial Binary Memory Models or Parallel Continuous Memory Models ? , 2003 .

[19]  Gregor Schöner,et al.  An embodied account of serial order: How instabilities drive sequence generation , 2010, Neural Networks.

[20]  G. Schöner The Cambridge Handbook of Computational Psychology: Dynamical Systems Approaches to Cognition , 2008 .

[21]  Stephen Grossberg,et al.  Laminar cortical dynamics of conscious speech perception: neural model of phonemic restoration using subsequent context in noise. , 2011, The Journal of the Acoustical Society of America.

[22]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[23]  Stephen Grossberg,et al.  How laminar frontal cortex and basal ganglia circuits interact to control planned and reactive saccades , 2004, Neural Networks.

[24]  Jeffrey S. Johnson,et al.  The Dynamic Field Theory and Embodied Cognitive Dynamics , 2008 .

[25]  Estela Bicho,et al.  The dynamic approach to autonomous robotics demonstrated on a low-level vehicle platform , 1997, Robotics Auton. Syst..

[26]  E. Thelen,et al.  The dynamics of embodiment: A field theory of infant perseverative reaching , 2001, Behavioral and Brain Sciences.

[27]  Stephen Grossberg,et al.  A Theory of Human Memory: Self-Organization and Performance of Sensory-Motor Codes, Maps, and Plans , 1982 .