Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Intrinsic motivation (Baranes and Oudeyer, 2009; Oudeyer and Kaplan, 2009; Oudeyer et al., 2007; Schmidhuber, 1991, 2010) can be a powerful concept to endow an agent with an automated mechanism to continuously explore its environment in the absence of task information. One common way to implement intrinsic motivation is to train a predictive model alongside the agent’s policy and use the model’s prediction error as a reward signal for the agent encouraging the exploration of previously unfamiliar transitions in the environment a method also known as curiosity learning (Pathak et al., 2017). Curiosity-esque reward schemes have been used in different ways to facilitate exploration in sparse tasks (Burda et al., 2018b; Houthooft et al., 2016) or pre-train policy networks before fine-tuning them on difficult downstream tasks (Sekar et al., 2020). In environments where the main task objective is highly correlated with thorough exploration, curiosity-based approaches have also been shown to solve the main task without any additional reward signal (Burda et al., 2018a).

[1]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[2]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[3]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[4]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[5]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[6]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[7]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[8]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[9]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[10]  Dushyant Rao,et al.  Data-efficient Hindsight Off-policy Option Learning , 2021, ICML.

[11]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[12]  Sham M. Kakade,et al.  Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[13]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[14]  Jost Tobias Springenberg,et al.  Simple Sensor Intentions for Exploration , 2020, ArXiv.

[15]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[16]  Martin A. Riedmiller,et al.  Compositional Transfer in Hierarchical Reinforcement Learning , 2019, Robotics: Science and Systems.

[17]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[18]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[19]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[20]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[21]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[22]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[23]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[24]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[25]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Pierre-Yves Oudeyer,et al.  R-IAC: Robust Intrinsically Motivated Exploration and Active Learning , 2009, IEEE Transactions on Autonomous Mental Development.

[27]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[28]  Sandy H. Huang,et al.  Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning , 2019, ArXiv.

[29]  Wojciech Zaremba,et al.  Asymmetric self-play for automatic goal discovery in robotic manipulation , 2021, ArXiv.

[30]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[31]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[32]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[33]  Martin Riedmiller,et al.  Representation Matters: Improving Perception and Exploration for Robotics , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Karol Hausman,et al.  Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning , 2020, Robotics: Science and Systems.

[35]  Daniel L. K. Yamins,et al.  Learning to Play with Intrinsically-Motivated Self-Aware Agents , 2018, NeurIPS.

[36]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[39]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.