Probabilistic Incremental Program Evolution

Probabilistic Incremental Program Evolution (PIPE) is a novel technique for automatic program synthesis. We combine probability vector coding of program instructions, Population Based Incremental Learning, and tree-coded programs like those used in some variants of Genetic Programming (GP). PIPE iteratively generates successive populations of functional programs according to an adaptive probability distribution over all possible programs. Each iteration it uses the best program to reene the distribution. Thus, it stochastically generates better and better programs. Since distribution reenements depend only on the best program of the current population, PIPE can evaluate program populations eeciently when the goal is to discover a program with minimal runtime. We compare PIPE to GP on a function regression problem and the 6-bit parity problem. We also use PIPE to solve tasks in partially observable mazes, where the best programs have minimal runtime.

[1]  Leonid A. Levin,et al.  Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[2]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[3]  Ray J. Solomonoff,et al.  The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[4]  C. Watkins Learning from delayed rewards , 1989 .

[5]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[6]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[7]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[8]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[11]  Astro Teller,et al.  The evolution of mental models , 1994 .

[12]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[13]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[14]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[15]  Dave Cliff,et al.  Adding Temporary Memory to ZCS , 1994, Adapt. Behav..

[16]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[17]  Una-May O'Reilly,et al.  An analysis of genetic programming , 1995 .

[18]  S. Baluja An Empirical Comparison of Seven Iterative and Evolutionary Function Optimization Heuristics , 1995 .

[19]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[20]  Corso Elvezia Hq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning , 1996 .

[21]  Andrew McCallum,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[22]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[23]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[24]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[25]  Juergen Schmidhuber,et al.  Incremental self-improvement for life-time multi-agent reinforcement learning , 1996 .

[26]  Jürgen Schmidhuber,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[27]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[28]  Shumeet Baluja,et al.  Using Optimal Dependency-Trees for Combinational Optimization , 1997, ICML.

[29]  Jürgen Schmidhuber,et al.  On Learning Soccer Strategies , 1997, ICANN.

[30]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[31]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.