论文信息 - Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement Learning

Self-Organizing Perceptual and Temporal Abstraction for Robot Reinforcement Learning

A major current challenge in reinforcement learning research is to extend methods that work well on discrete, short-range, low-dimensional problems to continuous, highdiameter, high-dimensional problems, such as robot navigation using high-resolution sensors. We present a method whereby an robot in a continuous world can, with little prior knowledge of its sensorimotor system, environment, and task, improve task learning by first using a self-organizing feature map to develop a set of higher-level perceptual features while exploring using primitive, local actions. Then using those features, the agent can build a set of high-level actions that carry it between perceptually distinctive states in the environment. This method combines a perceptual abstraction of the agent’s sensory input into useful perceptual features, and a temporal abstraction of the agent’s motor output into extended, high-level actions, thus reducing both the dimensionality and the diameter of the task. An experiment on a simulated robot navigation task shows that the agent using this method can learn to perform a task requiring 300 small-scale, local actions using as few as 7 temporally-extended, abstract actions, significantly improving learning time.

[1] Bruce L. Digney,et al. Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[2] Amy McGovern. Autonomous Discovery of Abstractions through Interaction with an Environment , 2002, SARA.

[3] Bernd Fritzke,et al. A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[4] Ulrich Nehmzow,et al. Mapbuilding using self-organising networks in “really useful robots” , 1991 .

[5] Tom Duckett,et al. Performance Comparison of Landmark Recognition Systems for Navigating Mobile Robots , 2000, AAAI/IAAI.

[6] Teuvo Kohonen,et al. Self-Organizing Maps , 2010 .

[7] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[8] Richard T. Vaughan,et al. The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[9] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[10] John Hallam,et al. Location Recognition in a Mobile Robot Using Self-Organising Feature Maps , 1991 .

[11] Helge J. Ritter,et al. Three-dimensional neural net for learning visuomotor coordination of a robot arm , 1990, IEEE Trans. Neural Networks.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13] Benjamin Kuipers,et al. The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[14] Andrew James Smith,et al. Applications of the self-organising map to reinforcement learning , 2002, Neural Networks.

[15] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[16] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[17] Benjamin Kuipers,et al. Toward Learning the Causal Layer of the Spatial Semantic Hierarchy using SOMs , 2001 .