Self-Motivated Development Through Rewards for Predictor Errors / Improvements

Even in absence of external reward, babies and scientists and others explore their world. Using some sort of adaptive predictive world model, they improve their ability to answer questions such as: what happens if I do this or that? They lose interest in both the predictable things and those predicted to remain unpredictable despite some effort. We can design curious, self-motivated robots that do the same. The author’s old basic principles for doing so: a reinforcement learning (RL) controller is rewarded whenever its action sequences result in predictor errors (1990), or, more generally, predictor improvements (1991). We briefly review the history of these ideas.

[1]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[2]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[4]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[5]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[6]  A. Turing On computable numbers, with an application to the Entscheidungsproblem , 1937, Proc. London Math. Soc..

[7]  SolomonoffR. Complexity-based induction systems , 2006 .

[8]  Jenq-Neng Hwang,et al.  Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[9]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[10]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[11]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[12]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[13]  Pierre-Yves Oudeyer,et al.  Maximizing Learning Progress: An Internal Reward System for Development , 2003, Embodied Artificial Intelligence.

[14]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[15]  Jürgen Schmidhuber,et al.  Goedel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements , 2003, ArXiv.

[16]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[17]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[18]  Dr. Marcus Hutter,et al.  Universal artificial intelligence , 2004 .

[19]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[20]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[21]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[22]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[25]  Marcus Hutter,et al.  Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.

[26]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[27]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[28]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[29]  Garrison W. Cottrell,et al.  Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.

[30]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[31]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[33]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[34]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.