Curiosity driven reinforcement learning for motion planning on humanoids

Most previous work on artificial curiosity (AC) and intrinsic motivation focuses on basic concepts and theory. Experimental results are generally limited to toy scenarios, such as navigation in a simulated maze, or control of a simple mechanical system with one or two degrees of freedom. To study AC in a more realistic setting, we embody a curious agent in the complex iCub humanoid robot. Our novel reinforcement learning (RL) framework consists of a state-of-the-art, low-level, reactive control layer, which controls the iCub while respecting constraints, and a high-level curious agent, which explores the iCub's state-action space through information gain maximization, learning a world model from experience, controlling the actual iCub hardware in real-time. To the best of our knowledge, this is the first ever embodied, curious agent for real-time motion planning on a humanoid. We demonstrate that it can learn compact Markov models to represent large regions of the iCub's configuration space, and that the iCub explores intelligently, showing interest in its physical constraints as well as in objects it finds in its environment.

[1]  July , 1890, The Hospital.

[2]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[3]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[4]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[5]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1986 .

[7]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[8]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[9]  R. A. Brooks,et al.  Intelligence without Representation , 1991, Artif. Intell..

[10]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[11]  Gregor Schöner,et al.  A dynamical systems approach to task-level system integration used to plan and control autonomous vehicle motion , 1992, Robotics Auton. Syst..

[12]  Pradeep K. Khosla,et al.  Real-time obstacle avoidance using harmonic potential functions , 1991, IEEE Trans. Robotics Autom..

[13]  G. Palli Intelligent Robots And Systems , 1993, Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '93).

[14]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  Lydia E. Kavraki,et al.  Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[17]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[18]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[21]  James L. McClelland,et al.  Autonomous Mental Development by Robots and Animals , 2001, Science.

[22]  Xiao Huang,et al.  Novelty and Reinforcement Learning in the Value System of Developmental Robots , 2002 .

[23]  Tsai-Yen Li,et al.  An incremental learning approach to motion planning with roadmap management , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[24]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[25]  Giulio Sandini,et al.  Developmental robotics: a survey , 2003, Connect. Sci..

[26]  Ioannis Iossifidis,et al.  Autonomous reaching and obstacle avoidance with the anthropomorphic arm of a robotic assistant using the attractor dynamics approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[27]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  David Hsu,et al.  Narrow passage sampling for probabilistic roadmap planning , 2005, IEEE Transactions on Robotics.

[30]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[31]  Ioannis Iossifidis Reaching with a Redundant Anthropomorphic Robot Arm using Attractor Dynamics , 2006 .

[32]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[33]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[34]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[35]  Giulio Sandini,et al.  Autonomous learning of 3D reaching in a humanoid robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[36]  Giulio Sandini,et al.  The iCub humanoid robot: an open platform for research in embodied cognition , 2008, PerMIS.

[37]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[38]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[39]  Jurgen Schmidhuber,et al.  Artificial curiosity with planning for autonomous perceptual and cognitive development , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[40]  Ehud Ahissar,et al.  Reinforcement active learning hierarchical loops , 2011, The 2011 International Joint Conference on Neural Networks.

[41]  Emilio Frazzoli,et al.  Asymptotically-optimal path planning for manipulation using incremental sampling-based algorithms , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Ehud Ahissar,et al.  A Curious Emergence of Reaching , 2012, TAROS.

[43]  Jürgen Schmidhuber,et al.  Autonomous learning of abstractions using Curiosity-Driven Modular Incremental Slow Feature Analysis , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[44]  Jürgen Schmidhuber,et al.  Learning tactile skills through curious exploration , 2012, Front. Neurorobot..

[45]  Benjamin Kuipers,et al.  Autonomous Learning of High-Level States and Actions in Continuous Environments , 2012, IEEE Transactions on Autonomous Mental Development.

[46]  Jürgen Leitner,et al.  The Modular Behavioral Environment for Humanoids and other Robots (MoBeE) , 2012, ICINCO.

[47]  Jürgen Schmidhuber,et al.  Learning skills from play: Artificial curiosity on a Katana robot arm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[48]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[49]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[50]  Jürgen Leitner,et al.  Task-relevant roadmaps: A framework for humanoid motion planning , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[51]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.