The two-dimensional organization of behavior

This paper addresses the problem of continual learning [1] in a new way, combining multi-modular reinforcement learning with inspiration from the motor cortex to produce a unique perspective on hierarchical behavior. Most reinforcement-learning agents represent policies monolithically using a single table or function approximator. In those cases where the policies are split among a few different modules, these modules are related to each other only in that they work together to produce the agent's overall policy. In contrast, the brain appears to organize motor behavior in a two-dimensional map, where nearby locations represent similar behaviors. This representation allows the brain to build hierarchies of motor behavior that correspond not to hierarchies of subroutines but to regions of the map such that larger regions correspond to more general behaviors. Inspired by the benefits of the brain's representation, the system presented here is a first step and the first attempt toward the two-dimensional organization of learned policies according to behavioral similarity. We demonstrate a fully autonomous multi-modular system designed for the constant accumulation of ever more sophisticated skills (the continual-learning problem). The system can split up a complex task among a large number of simple modules such that nearby modules correspond to similar policies. The eventual goal is to develop and use the resulting organization hierarchically, accessing behaviors by their location and extent in the map.

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[3]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artificial Intelligence.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[6]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[7]  P. Grobstein From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior , 1994 .

[8]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[9]  Dana H. Ballard,et al.  Credit Assignment in Multiple Goal Embodied Visuomotor Behavior , 2010, Front. Psychology.

[10]  Tom Schaul,et al.  Q-Error as a Selection Mechanism in Modular Reinforcement-Learning Systems , 2011, IJCAI.

[11]  T. Aflalo,et al.  Rethinking Cortical Organization , 2007, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[12]  Mario Tokoro,et al.  An Adaptive Architecture for Modular Q-Learning , 1997, IJCAI.

[13]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[14]  Domenico Parisi,et al.  A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions , 2010, EpiRob.

[15]  Marcus Hutter,et al.  Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[16]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[17]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[18]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[19]  Bram Bakker,et al.  Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2005 .

[20]  M. Graziano The Intelligent Movement Machine: An Ethological Perspective on the Primate Motor System , 2008 .

[21]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[22]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[23]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[24]  Jonas Karlsson,et al.  Learning via task decomposition , 1993 .

[25]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[26]  Michael S. A. Graziano The Intelligent Movement Machine , 2009 .

[27]  Richard S. Sutton,et al.  Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.