EARNING A WARENESS M ODELS

We consider the setting of an agent with a fixed body interacting with an unknown and uncertain external world. We show that models trained to predict proprioceptive information about the agent’s body come to represent objects in the external world. In spite of being trained with only internally available signals, these dynamic body models come to represent external objects through the necessity of predicting their effects on the agent’s own body. That is, the model learns holistic persistent representations of objects in the world, even though the only training signals are body signals. Our dynamics model is able to successfully predict distributions over 132 sensor readings over 100 steps into the future and we demonstrate that even when the body is no longer in contact with an object, the latent variables of the dynamics model continue to represent its shape. We show that active data collection by maximizing the entropy of predictions about the body— touch sensors, proprioception and vestibular information—leads to learning of dynamic models that show superior performance when used for control. We also collect data from a real robotic hand and show that the same models can be used to answer questions about properties of objects in the real world. Videos with qualitative results of our models are available at https://goo.gl/mZuqAV.

[1]  R. Klatzky,et al.  Hand movements: A window into haptic object recognition , 1987, Cognitive Psychology.

[2]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[3]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[4]  C. Bishop Mixture density networks , 1994 .

[5]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[6]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[7]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[8]  Jürgen Schmidhuber,et al.  Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[9]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[10]  Lihong Li,et al.  A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.

[11]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[12]  David Beymer,et al.  Closed-Form Jensen-Renyi Divergence for Mixture of Gaussians and Applications to Group-Wise Shape Registration , 2009, MICCAI.

[13]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[14]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[15]  Ana Paiva,et al.  Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents , 2011, ACII.

[16]  Stuart D. Harshbarger,et al.  An Overview of the Developmental Process for the Modular Prosthetic Limb , 2011 .

[17]  Tomonori Yamamoto,et al.  Use of tactile feedback to control exploratory movements to characterize object compliance , 2012, Front. Neurorobot..

[18]  Heinz Wörn,et al.  Haptic object recognition for multi-fingered robot hands , 2012, 2012 IEEE Haptics Symposium (HAPTICS).

[19]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Jianwei Zhang,et al.  Tactile sensor value preprocessing pipeline , 2013, 2013 17th International Conference on System Theory, Control and Computing (ICSTCC).

[21]  P. Bromiley Products and Convolutions of Gaussian Probability Density Functions , 2013 .

[22]  Ralf Der,et al.  Information Driven Self-Organization of Complex Robotic Behaviors , 2013, PloS one.

[23]  Thorsten Joachims,et al.  Learning Trajectory Preferences for Manipulators via Iterative Improvement , 2013, NIPS.

[24]  Chia-Hsien Lin,et al.  Estimating Point of Contact , Force and Torque in a Biomimetic Tactile Sensor with Deformable Skin , 2013 .

[25]  Christian Osendorfer,et al.  Learning Stochastic Recurrent Networks , 2014, NIPS 2014.

[26]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[27]  Gaurav S. Sukhatme,et al.  Force estimation and slip detection/classification for grip control using a biomimetic tactile sensor , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[28]  Uri Shalit,et al.  Deep Kalman Filters , 2015, ArXiv.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Frank Kirchner,et al.  Haptic Object Recognition in Underwater and Deep-sea Environments , 2015, J. Field Robotics.

[31]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[32]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[33]  Maximilian Karl,et al.  Unsupervised preprocessing for Tactile Data , 2016, ArXiv.

[34]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[35]  Yang Gao,et al.  Deep learning for tactile understanding from visual and haptic data , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Sergey Levine,et al.  One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[38]  Il Memming Park,et al.  BLACK BOX VARIATIONAL INFERENCE FOR STATE SPACE MODELS , 2015, 1511.07367.

[39]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[40]  Fuchun Sun,et al.  Efficient Spatio-Temporal Tactile Object Recognition with Randomized Tiling Convolutional Networks in a Hierarchical Fusion Strategy , 2016, AAAI.

[41]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[42]  Lu Fang,et al.  Deep Learning for Surface Material Classification Using Haptic and Visual Information , 2015, IEEE Transactions on Multimedia.

[43]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[45]  Yoshua Bengio The Consciousness Prior , 2017, ArXiv.

[46]  Byron Boots,et al.  Predictive-State Decoders: Encoding the Future into Recurrent Networks , 2017, NIPS.

[47]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[48]  Justin Fu,et al.  EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[49]  Byron Boots,et al.  Predictive State Recurrent Neural Networks , 2017, NIPS.

[50]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[51]  John Kenneth Salisbury,et al.  Learning to represent haptic feedback for partially-observable tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Andrew Owens,et al.  The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? , 2017, CoRL.

[53]  Heni Ben Amor,et al.  Robots that anticipate pain: Anticipating physical perturbations from visual cues through deep predictive models , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[54]  Feng Gao,et al.  Feeling the force: Integrating force and pose for fluent discovery through imitation learning to open medicine bottles , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[55]  Ryota Kanai,et al.  Curiosity-Driven Reinforcement Learning with Homeostatic Regulation , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[56]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[57]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[58]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[59]  Daniel L. K. Yamins,et al.  Emergence of Structured Behaviors from Curiosity-Based Intrinsic Motivation , 2018, CogSci.