Explore to see, learn to perceive, get the actions for free: SKILLABILITY

How can a humanoid robot autonomously learn and refine multiple sensorimotor skills as a byproduct of curiosity driven exploration, upon its high-dimensional unprocessed visual input? We present SKILLABILITY, which makes this possible. It combines the recently introduced Curiosity Driven Modular Incremental Slow Feature Analysis (Curious Dr. MISFA) with the well-known options framework. Curious Dr. MISFA's objective is to acquire abstractions as quickly as possible. These abstractions map high-dimensional pixel-level vision to a low-dimensional manifold. We find that each learnable abstraction augments the robot's state space (a set of poses) with new information about the environment, for example, when the robot is grasping a cup. The abstraction is a function on an image, called a slow feature, which can effectively discretize a high-dimensional visual sequence. For example, it maps the sequence of the robot watching its arm as it moves around, grasping randomly, then grasping a cup, and moving around some more while holding the cup, into a step function having two outputs: when the cup is or is not currently grasped. The new state space includes this grasped/not grasped information. Each abstraction is coupled with an option. The reward function for the option's policy (learned through Least Squares Policy Iteration) is high for transitions that produce a large change in the step-functionlike slow features. This corresponds to finding bottleneck states, which are known good subgoals for hierarchical reinforcement learning - in the example, the subgoal corresponds to grasping the cup. The final skill includes both the learned policy and the learned abstraction. SKILLABILITY makes our iCub the first humanoid robot to learn complex skills such as to topple or grasp an object, from raw high-dimensional video input, driven purely by its intrinsic motivations.

[1]  Jürgen Schmidhuber,et al.  Incremental Slow Feature Analysis , 2011, IJCAI.

[2]  Jürgen Leitner,et al.  Task-relevant roadmaps: A framework for humanoid motion planning , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Benjamin Kuipers,et al.  Autonomous Learning of High-Level States and Actions in Continuous Environments , 2012, IEEE Transactions on Autonomous Mental Development.

[4]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[7]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[8]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[9]  Jürgen Schmidhuber,et al.  Low Complexity Proto-Value Function Learning from Sensory Observations with Incremental Slow Feature Analysis , 2012, ICANN.

[10]  Michael Werman,et al.  An On-Line Agglomerative Clustering Method for Nonstationary Data , 1999, Neural Computation.

[11]  Jürgen Schmidhuber,et al.  Autonomous learning of abstractions using Curiosity-Driven Modular Incremental Slow Feature Analysis , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[12]  Jürgen Schmidhuber,et al.  Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.

[13]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[14]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[15]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[16]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[17]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[18]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[19]  Jürgen Schmidhuber,et al.  An intrinsic value system for developing multiple invariant representations with incremental slowness learning , 2013, Front. Neurorobot..

[20]  Graeme Mitchison,et al.  Removing Time Variation with the Anti-Hebbian Differential Synapse , 1991, Neural Computation.

[21]  Peter Földiák,et al.  Sparse coding in the primate cortex , 1998 .

[22]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[25]  Daoqiang Zhang,et al.  Improving the Robustness of ‘Online Agglomerative Clustering Method’ Based on Kernel-Induce Distance Measures , 2005, Neural Processing Letters.

[26]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[27]  Oliver Kroemer,et al.  Towards Motor Skill Learning for Robotics , 2007, ISRR.