PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem

Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. Consider the infinite set of all computable descriptions of tasks with possibly computable solutions. Given a general problem-solving architecture, at any given time, the novel algorithmic framework PowerPlay (Schmidhuber, 2011) searches the space of possible pairs of new tasks and modifications of the current problem solver, until it finds a more powerful problem solver that provably solves all previously learned tasks plus the new one, while the unmodified predecessor does not. Newly invented tasks may require to achieve a wow-effect by making previously learned skills more efficient such that they require less time and space. New skills may (partially) re-use previously learned skills. The greedy search of typical PowerPlay variants uses time-optimal program search to order candidate pairs of tasks and solver modifications by their conditional computational (time and space) complexity, given the stored experience so far. The new task and its corresponding task-solving skill are those first found and validated. This biases the search toward pairs that can be described compactly and validated quickly. The computational costs of validating new tasks need not grow with task repertoire size. Standard problem solver architectures of personal computers or neural networks tend to generalize by solving numerous tasks outside the self-invented training set; PowerPlay’s ongoing search for novelty keeps breaking the generalization abilities of its present solver. This is related to Gödel’s sequence of increasingly powerful formal theories based on adding formerly unprovable statements to the axioms without affecting previously provable theorems. The continually increasing repertoire of problem-solving procedures can be exploited by a parallel search for solutions to additional externally posed tasks. PowerPlay may be viewed as a greedy but practical implementation of basic principles of creativity (Schmidhuber, 2006a, 2010). A first experimental analysis can be found in separate papers (Srivastava et al., 2012a,b, 2013).

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[3]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[4]  Emil L. Post Finite combinatory processes—formulation , 1936, Journal of Symbolic Logic.

[5]  A. Church An Unsolvable Problem of Elementary Number Theory , 1936 .

[6]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[7]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[8]  D. Berlyne NOVELTY AND CURIOSITY AS DETERMINANTS OF EXPLORATORY BEHAVIOUR1 , 1950 .

[9]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[10]  D. Berlyne A theory of human curiosity. , 1954, British journal of psychology.

[11]  J. Piaget The child's construction of reality , 1954 .

[12]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[13]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[14]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[15]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[16]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[17]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[18]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[19]  Ray J. Solomonoff,et al.  Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[20]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[21]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[22]  Allen Newell,et al.  GPS, a program that simulates human thought , 1995 .

[23]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[24]  Jürgen Schmidhuber,et al.  Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem , 1990 .

[25]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[26]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[27]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[28]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[29]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[30]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[31]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[32]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[33]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[34]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[35]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[36]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[37]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[38]  Melvin Fitting,et al.  First-Order Logic and Automated Theorem Proving , 1990, Graduate Texts in Computer Science.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  J. Schmidhuber What''s interesting? , 1997 .

[41]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Graduate Texts in Computer Science.

[42]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[43]  Jürgen Schmidhuber,et al.  Bias-Optimal Incremental Problem Solving , 2002, NIPS.

[44]  Marcus Hutter The Fastest and Shortest Algorithm for all Well-Defined Problems , 2002, Int. J. Found. Comput. Sci..

[45]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[46]  Jürgen Schmidhuber,et al.  Optimal Ordered Problem Solver , 2002, Machine Learning.

[47]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[48]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[49]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[50]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[51]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[52]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[53]  Jürgen Schmidhuber,et al.  Gödel Machines: Fully Self-referential Optimal Universal Self-improvers , 2007, Artificial General Intelligence.

[54]  Max Lungarella,et al.  Developmental Robotics , 2009, Encyclopedia of Artificial Intelligence.

[55]  P. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[56]  Risto Miikkulainen,et al.  Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[57]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[58]  Jürgen Schmidhuber,et al.  Ultimate Cognition à la Gödel , 2009, Cognitive Computation.

[59]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[60]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[61]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[62]  Tom Schaul,et al.  Curiosity-driven optimization , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[63]  Ring Mark,et al.  Compression Progress-Based Curiosity Drive for Developmental Learning , 2011 .

[64]  Jurgen Schmidhuber,et al.  Artificial curiosity with planning for autonomous perceptual and cognitive development , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[65]  Faustino J. Gomez,et al.  Intrinsically Motivated Evolutionary Search for Vision-Based Reinforcement Learning , 2011 .

[66]  Jürgen Schmidhuber,et al.  Self-Delimiting Neural Networks , 2012, ArXiv.

[67]  Jürgen Schmidhuber,et al.  Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[68]  W. Marsden I and J , 2012 .

[69]  Peter Dayan,et al.  Exploration from Generalization Mediated by Multiple Controllers , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[70]  Kevin M. Small,et al.  Estimation and Inference , 2013 .

[71]  Christoph Salge,et al.  Approximation of Empowerment in the continuous Domain , 2013, Adv. Complex Syst..

[72]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[73]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[74]  T. Martin McGinnity,et al.  Novelty Detection as an Intrinsic Motivation for Cumulative Learning Robots , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[75]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[76]  GPS , 2014 .