Artificial curiosity based on discovering novel algorithmic predictability through coevolution

One explores a spatio-temporal domain by predicting and learning from success/failure what's predictable and what's not. The author studies a "curious" embedded agent that differs from previous explorers in the sense that it can limit its predictions to fairly arbitrary, computable aspects of event sequences and thus can explicitly ignore almost arbitrary unpredictable, random aspects. It constructs initially random algorithms mapping event sequences to abstract internal representations (IRs). It also constructs algorithms predicting IRs from IRs computed earlier. It wants to learn novel algorithms creating IRs useful for correct IR predictions, without wasting time on those learned before. This is achieved by a co-evolutionary scheme involving two competing modules co-evolutionary designing single algorithms to be executed. The modules can bet on the outcome of IR predictions computed by the algorithms they have agreed upon. If their opinions differ then the system checks who's right, punishes the loser (the surprised one), and rewards the winner. A reinforcement learning algorithm forces each module to maximise reward. This motivates both modules to lure the other into agreeing upon algorithms involving predictions that surprise it. Since each module essentially can put in its veto against algorithms it does not consider profitable, the system is motivated to focus on those computable aspects of the environment where both modules still have confident but different opinions. Once both share the same opinion on a particular issue, the winner loses a source of reward-an incentive to shift the focus of interest onto novel, yet unknown algorithms.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[4]  Douglas B. Lenat,et al.  Theory Formation by Heuristic Search , 1983, Artificial Intelligence.

[5]  John H. Holland,et al.  Properties of the Bucket Brigade , 1985, ICGA.

[6]  Editors , 1986, Brain Research Bulletin.

[7]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[8]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[9]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[10]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[11]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[12]  Jenq-Neng Hwang,et al.  Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[13]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[14]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[15]  Garrison W. Cottrell,et al.  Learning Mackey-Glass from 25 Examples, Plus or Minus 2 , 1993, NIPS.

[16]  Jürgen Schmidhuber,et al.  Discovering Predictable Classifications , 1993, Neural Computation.

[17]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[18]  Gerhard Weiß,et al.  Hierarchical Chunking in Classifier Systems , 1994, AAAI.

[19]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[20]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[21]  Gerhard Weiß,et al.  Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.

[22]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[23]  Sandip Sen,et al.  Adaption and Learning in Multi-Agent Systems , 1995, Lecture Notes in Computer Science.

[24]  Paul Bourgine,et al.  Mimicry and Coevolution of Hedonic Agents , 1995, ECAL.

[25]  Jordan B. Pollack,et al.  Why did TD-Gammon Work? , 1996, NIPS.

[26]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[27]  J. Schmidhuber What''s interesting? , 1997 .

[28]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[29]  Kagan Tumer,et al.  Using Collective Intelligence to Route Internet Traffic , 1998, NIPS.