论文信息 - Accumulator Intrinsic Motivation Block φ L 1 φ L k Input Observations ( x ) Reward Switch

Accumulator Intrinsic Motivation Block φ L 1 φ L k Input Observations ( x ) Reward Switch

To autonomously learn behaviors in complex environments, vision-based agents need to develop useful sensory abstractions from high-dimensional video. We propose a modular, curiosity-driven learning system that autonomously learns multiple abstract representations. The policy to build the library of abstractions is adapted through reinforcement learning, and the corresponding abstractions are learned through incremental slow-feature analysis (IncSFA). IncSFA learns each abstraction based on how the inputs change over time, directly from unprocessed visual data. Modularity is induced via a gating system, which also prevents abstraction duplication. The system is driven by a curiosity signal that is based on the learnability of the inputs by the current adaptive module. After the learning completes, the result is multiple slow-feature modules serving as distinct behavior-specific abstractions. Experiments with a simulated iCub humanoid robot show how the proposed method effectively learns a set of abstractions from raw un-preprocessed video, to our knowledge the first curious learning agent to demonstrate this ability.

Marijn F. Stollenga | Varun Raj Kompella | J. Schmidhuber | M. Luciw | L. Pape

[1] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[2] Michael Werman,et al. An On-Line Agglomerative Clustering Method for Nonstationary Data , 1999, Neural Computation.

[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.

[5] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[6] R. Coifman,et al. Diffusion Wavelets , 2004 .

[7] Daoqiang Zhang,et al. Improving the Robustness of ‘Online Agglomerative Clustering Method’ Based on Kernel-Induce Distance Measures , 2005, Neural Processing Letters.

[8] Ann B. Lee,et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.

[10] A. Barto,et al. Intrinsic Motivation For Reinforcement Learning Systems , 2005 .

[11] Giulio Sandini,et al. The iCub humanoid robot: an open platform for research in embodied cognition , 2008, PerMIS.

[12] Andrew G. Barto,et al. Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[13] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[14] Scott Kuindersma,et al. Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[15] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[16] Robert A. Legenstein,et al. Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[17] Jürgen Schmidhuber,et al. AutoIncSFA and vision-based developmental learning for humanoid robots , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[18] Henning Sprekeler,et al. On the Relation of Slow Feature Analysis and Laplacian Eigenmaps , 2011, Neural Computation.

[19] Jürgen Schmidhuber,et al. Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.