Artificial curiosity with planning for autonomous perceptual and cognitive development

Autonomous agents that learn from reward on high-dimensional visual observations must learn to simplify the raw observations in both space (i.e., dimensionality reduction) and time (i.e., prediction), so that reinforcement learning becomes tractable and effective. Training the spatial and temporal models requires an appropriate sampling scheme, which cannot be hard-coded if the algorithm is to be general. Intrinsic rewards are associated with samples that best improve the agent's model of the world. Yet the dynamic nature of an intrinsic reward signal presents a major obstacle to successfully realizing an efficient curiosity-drive. TD-based incremental reinforcement learning approaches fail to adapt quickly enough to effectively exploit the curiosity signal. In this paper, a novel artificial curiosity system with planning is implemented, based on developmental or continual learning principles. Least-squares policy iteration is used with an agent's internal forward model, to efficiently assign values for maximizing combined external and intrinsic reward. The properties of this system are illustrated in a high-dimensional, noisy, visual environment that requires the agent to explore. With no useful external value information early on, the self-generated intrinsic values lead to actions that improve both its spatial (perceptual) and temporal (cognitive) models. Curiosity also leads it to learn how it could act to maximize external reward.

[1]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[2]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[3]  Shie Mannor,et al.  Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[4]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[5]  Alexander Gloye,et al.  Reinforcing the Driving Quality of Soccer Playing Robots by Anticipation (Verbesserung der Fahreigenschaften von fußballspielenden Robotern durch Antizipation) , 2005, it Inf. Technol..

[6]  Gianluca Baldassarre,et al.  Forward and Bidirectional Planning Based on Reinforcement Learning and Neural Networks in a Simulated Robot , 2003, ABiALS.

[7]  Juyang Weng,et al.  Dually Optimal Neuronal Layers: Lobe Component Analysis , 2009, IEEE Transactions on Autonomous Mental Development.

[8]  Jürgen Schmidhuber,et al.  Formal Theory of Fun and Creativity , 2010, ECML/PKDD.

[9]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[10]  Fernando Fernández,et al.  Two steps reinforcement learning , 2008, Int. J. Intell. Syst..

[11]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[12]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[13]  Jürgen Schmidhuber,et al.  Unsupervised Modeling of Partially Observable Environments , 2011, ECML/PKDD.

[14]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[15]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[16]  Fernando Fernández,et al.  Editorial: Modeling decisions for artificial intelligence , 2008 .

[17]  Tom Schaul,et al.  Coherence Progress: A Measure of Interestingness Based on Fixed Compressors , 2011, AGI.

[18]  Jürgen Schmidhuber,et al.  Sequential Constant Size Compressors for Reinforcement Learning , 2011, AGI.

[19]  Juyang Weng,et al.  Developmental Robotics: Theory and Experiments , 2004, Int. J. Humanoid Robotics.

[20]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[21]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[22]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[23]  Hugo Vieira Neto,et al.  Visual novelty detection with automatic scale selection , 2007, Robotics Auton. Syst..

[24]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[25]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.