Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement Learning

A major current challenge in reinforcement learning research is to extend methods that work well on discrete, short-range, low-dimensional problems to continuous, highdiameter, high-dimensional problems, such as robot navigation using high-resolution sensors. We present a method whereby an robot in a continuous world can, with little prior knowledge of its sensorimotor system, environment, and task, improve task learning by first using a self-organizing feature map to develop a set of higher-level perceptual features while exploring using primitive, local actions. Then using those features, the agent can build a set of high-level actions that carry it between perceptually distinctive states in the environment. This method combines a perceptual abstraction of the agent’s sensory input into useful perceptual features, and a temporal abstraction of the agent’s motor output into extended, high-level actions, thus reducing both the dimensionality and the diameter of the task. An experiment on a simulated robot navigation task shows that the agent using this method can learn to perform a task requiring 300 small-scale, local actions using as few as 7 temporally-extended, abstract actions, significantly improving learning time.

[1]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[2]  Amy McGovern Autonomous Discovery of Abstractions through Interaction with an Environment , 2002, SARA.

[3]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[4]  Ulrich Nehmzow,et al.  Mapbuilding using self-organising networks in “really useful robots” , 1991 .

[5]  Tom Duckett,et al.  Performance Comparison of Landmark Recognition Systems for Navigating Mobile Robots , 2000, AAAI/IAAI.

[6]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[7]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[8]  Richard T. Vaughan,et al.  The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[9]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[10]  John Hallam,et al.  Location Recognition in a Mobile Robot Using Self-Organising Feature Maps , 1991 .

[11]  Helge J. Ritter,et al.  Three-dimensional neural net for learning visuomotor coordination of a robot arm , 1990, IEEE Trans. Neural Networks.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Benjamin Kuipers,et al.  The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[14]  Andrew James Smith,et al.  Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.

[15]  Pattie Maes,et al.  Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[16]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[17]  Benjamin Kuipers,et al.  Toward Learning the Causal Layer of the Spatial Semantic Hierarchy using SOMs , 2001 .