Credit Assignment in Multiple Goal Embodied Visuomotor Behavior

The intrinsic complexity of the brain can lead one to set aside issues related to its relationships with the body, but the field of embodied cognition emphasizes that understanding brain function at the system level requires one to address the role of the brain-body interface. It has only recently been appreciated that this interface performs huge amounts of computation that does not have to be repeated by the brain, and thus affords the brain great simplifications in its representations. In effect the brain's abstract states can refer to coded representations of the world created by the body. But even if the brain can communicate with the world through abstractions, the severe speed limitations in its neural circuitry mean that vast amounts of indexing must be performed during development so that appropriate behavioral responses can be rapidly accessed. One way this could happen would be if the brain used a decomposition whereby behavioral primitives could be quickly accessed and combined. This realization motivates our study of independent sensorimotor task solvers, which we call modules, in directing behavior. The issue we focus on herein is how an embodied agent can learn to calibrate such individual visuomotor modules while pursuing multiple goals. The biologically plausible standard for module programming is that of reinforcement given during exploration of the environment. However this formulation contains a substantial issue when sensorimotor modules are used in combination: The credit for their overall performance must be divided amongst them. We show that this problem can be solved and that diverse task combinations are beneficial in learning and not a complication, as usually assumed. Our simulations show that fast algorithms are available that allot credit correctly and are insensitive to measurement noise.

[1]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[2]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[3]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[4]  Z. Pylyshyn,et al.  Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. , 1994, Psychological review.

[5]  Constantin A Rothkopf,et al.  Image statistics at the point of gaze during human navigation , 2009, Visual Neuroscience.

[6]  Dana H. Ballard,et al.  Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[7]  Lawrence W Barsalou,et al.  Simulation, situated conceptualization, and prediction , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[8]  Demetri Terzopoulos,et al.  Artificial life for computer graphics , 1999, CACM.

[9]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[10]  Maja J. Matarić,et al.  Action Selection methods using Reinforcement Learning , 1996 .

[11]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[12]  David P. Miller,et al.  Experiences with an architecture for intelligent, reactive agents , 1995, J. Exp. Theor. Artif. Intell..

[13]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[14]  Andrew G. Barto,et al.  Intrinsically Motivated Hierarchical Skill Learning in Structured Environments , 2010, IEEE Transactions on Autonomous Mental Development.

[15]  Norman I. Badler,et al.  Animation control for real-time virtual humans , 1999, CACM.

[16]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[17]  Timothy C Rickard,et al.  Taxing executive processes does not necessarily increase impulsive decision making. , 2010, Experimental psychology.

[18]  Norman I. Badler,et al.  Simulating humans: computer graphics animation and control , 1993 .

[19]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[20]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[21]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[22]  Michael J. Swain,et al.  An Architecture for Vision and Action , 1995, IJCAI.

[23]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[24]  Antonio Rangel,et al.  Neural computations associated with goal-directed choice , 2010, Current Opinion in Neurobiology.

[25]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[26]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[29]  Joanna Bryson,et al.  Modularity and Design in Reactive Intelligence , 2001, IJCAI.

[30]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[31]  S. Ullman Visual routines , 1984, Cognition.

[32]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[33]  Sertan Girgin,et al.  Abstraction in Reinforcement Learning , 2009 .

[34]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[35]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[36]  Demetri Terzopoulos,et al.  Artificial Fishes: Autonomous Locomotion, Perception, Behavior, and Learning in a Simulated Physical World , 1994, Artificial Life.

[37]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[38]  Dana H. Ballard,et al.  Modeling embodied visual behaviors , 2007, TAP.

[39]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[40]  John R. Anderson,et al.  Authoring Content in the PAT Algebra Tutor , 1998 .

[41]  J. O'Doherty,et al.  Encoding Predictive Reward Value in Human Amygdala and Orbitofrontal Cortex , 2003, Science.

[42]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[43]  M. Hayhoe,et al.  What controls attention in natural environments? , 2001, Vision Research.

[44]  Brett R. Fajen,et al.  Behavioral Dynamics of Human Locomotion , 2004 .

[45]  W. Schultz Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.

[46]  Pat Langley,et al.  Learning Recursive Control Programs from Problem Solving , 2006, J. Mach. Learn. Res..

[47]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[48]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[49]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[50]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[51]  M. Kawato,et al.  Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[52]  Richard Reviewer-Granger Unified Theories of Cognition , 1991, Journal of Cognitive Neuroscience.

[53]  Edward K. Vogel,et al.  The capacity of visual working memory for features and conjunctions , 1997, Nature.

[54]  Jonas Karlsson,et al.  Learning to Solve Multiple Goals , 1997 .

[55]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[56]  Thomas W. Schubert,et al.  Embodiment as a unifying perspective for psychology , 2009 .

[57]  Simon Hong,et al.  New insights on the subcortical representation of reward , 2008, Current Opinion in Neurobiology.