A Bioinspired Hierarchical Reinforcement Learning Architecture for Modeling Learning of Multiple Skills with Continuous States and Actions

Organisms, and especially primates, are able to learn several skills while avoiding catastrophic interference and enhancing generalisation. This paper proposes a novel hierarchical reinforcement learning (RL) architecture with a number of features that make it suitable to investigate such phenomena. The proposed system combines the mixture of experts architecture with the neuralnetwork actor-critic architecture trained with the TD(! )r einforcement learning algorithm. In particular, responsibility signals provided by two gating networks (one for the actor and one for the critic) are used both to weight the outputs of the respective multiple (expert )c ontrollers and to modulate their learning. The system is tested with a simulated dynamic 2D robotic arm that autonomously learns to reach a target in (up to) three di! erent conditions. The results show that the system is able to appropriately allocate experts to tasks on the basis of the di! erences and similarities among the required sensorimotor mappings.

[1]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[4]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[5]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[6]  Eugenio Guglielmelli,et al.  A reinforcement learning model of reaching integrating kinematic and dynamic control in a simulated arm robot , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[7]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[8]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[9]  Domenico Parisi,et al.  Using Motor Babbling and Hebb Rules for Modeling the Development of Reaching with Obstacles and Grasping , 2008 .

[10]  David Elkind,et al.  Learning: An Introduction , 1968 .

[11]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[12]  Gianluca Baldassarre,et al.  A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours , 2002, Cognitive Systems Research.

[13]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[14]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[15]  N. Berthier,et al.  Proximodistal structure of early reaching in human infants , 1999, Experimental Brain Research.

[16]  A. Barto,et al.  Approximate optimal control as a model for motor learning. , 2005, Psychological review.

[17]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .