Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement

We study task sequences that allow for speeding up the learner's average reward intake through appropriate shifts of inductive bias (changes of the learner's policy). To evaluate long-term effects of bias shifts setting the stage for later bias shifts we use the “success-story algorithm” (SSA). SSA is occasionally called at times that may depend on the policy itself. It uses backtracking to undo those bias shifts that have not been empirically observed to trigger long-term reward accelerations (measured up until the current SSA call). Bias shifts that survive SSA represent a lifelong success history. Until the next SSA call, they are considered useful and build the basis for additional bias shifts. SSA allows for plugging in a wide variety of learning algorithms. We plug in (1) a novel, adaptive extension of Levin search and (2) a method for embedding the learner's policy modification strategy within the policy itself (incremental self-improvement). Our inductive transfer case studies involve complex, partially observable environments where traditional reinforcement learning fails.

[1]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[2]  Corso Elvezia,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997 .

[3]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[4]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[5]  Leonid A. Levin,et al.  Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[6]  Dave Cliff,et al.  Adding Temporary Memory to ZCS , 1994, Adapt. Behav..

[7]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[8]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[9]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[10]  K. Narendra,et al.  Learning AutomataA Survey , 1974 .

[11]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[12]  Jürgen Schmidhuber Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995, ICML.

[13]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences: statistical considerations , 1969, JACM.

[14]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[15]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[18]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[19]  John R. Koza,et al.  Genetic evolution and co-evolution of computer programs , 1991 .

[20]  Douglas B. Lenat,et al.  Theory Formation by Heuristic Search , 1983, Artificial Intelligence.

[21]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[22]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[23]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[24]  Ray J. Solomonoff,et al.  The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[25]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[26]  Pattie Maes,et al.  Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .

[27]  Xu Xin Reinforcement learning algorithm for partially observable Markov decision processes , 2004 .

[28]  Stuart J. Russell,et al.  Principles of Metareasoning , 1989, Artif. Intell..

[29]  Russell Greiner,et al.  PALO: A Probabilistic Hill-Climbing Algorithm , 1996, Artif. Intell..

[30]  Donald A. Berry,et al.  Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[31]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[32]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[33]  Jieyu Zhao,et al.  Simple Principles of Metalearning , 1996 .

[34]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[35]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[36]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[37]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[38]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[39]  Lorien Y. Pratt,et al.  A Survey of Transfer Between Connectionist Networks , 1996, Connect. Sci..

[40]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[41]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[42]  Juergen Schmidhuber,et al.  Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .

[43]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[44]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[45]  Osamu Watanabe,et al.  Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[46]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[47]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[48]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[49]  Paul E. Utgoff,et al.  Shift of bias for inductive concept learning , 1984 .