论文信息 - One Big Net For Everything

One Big Net For Everything

I apply recent work on "learning to think" (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by black box optimization / reinforcement learning / artificial evolution as well as supervised / unsupervised learning. For example, ONE may learn through neuroevolution to control a robot through environment-changing actions, and learn through unsupervised gradient descent to predict future inputs and vector-valued reward signals as suggested in 1990. User-given tasks can be defined through extra goal-defining input patterns, also proposed in 1990. Suppose ONE has already learned many skills. Now a copy of ONE can be re-trained to learn a new skill, e.g., through neuroevolution without a teacher. Here it may profit from re-using previously learned subroutines, but it may also forget previous skills. Then ONE is retrained in PowerPlay style (2011) on stored input/output traces of (a) ONE's copy executing the new skill and (b) previous instances of ONE whose skills are still considered worth memorizing. Simultaneously, ONE is retrained on old traces (even those of unsuccessful trials) to become a better predictor, without additional expensive interaction with the enviroment. More and more control and prediction skills are thus collapsed into ONE, like in the chunker-automatizer system of the neural history compressor (1991). This forces ONE to relate partially analogous skills (with shared algorithmic information) to each other, creating common subroutines in form of shared subnetworks of ONE, to greatly speed up subsequent learning of additional, novel but algorithmically related skills.

Jürgen Schmidhuber | J. Schmidhuber

[1] K. Gödel. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[2] Emil L. Post. Finite combinatory processes—formulation , 1936, Journal of Symbolic Logic.

[3] A. Church. An Unsolvable Problem of Elementary Number Theory , 1936 .

[4] A. Turing. On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[5] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[6] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[7] Gregory J. Chaitin,et al. On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[8] Alekseĭ Grigorʹevich Ivakhnenko,et al. CYBERNETIC PREDICTING DEVICES , 1966 .

[9] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[10] A. G. Ivakhnenko,et al. Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[11] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .

[12] J. Hérault,et al. Réseau de neurones à synapses modifiables: décodage de messages sensoriels composites par apprentissage non supervisé et permanent , 1984 .

[13] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[14] Jürgen Schmidhuber,et al. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[15] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[16] Peter M. Todd,et al. Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[17] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[18] T. Sejnowski,et al. Learning Algorithms for Networks with Internal and External Feedback , 1990 .

[19] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[20] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[21] Christian Jutten,et al. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[22] Jürgen Schmidhuber,et al. Learning to generate sub-goals for action sequences , 1991 .

[23] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[24] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[25] Schuster. Learning by maximizing the information transfer through nonlinear noisy neurons and "noise breakdown" , 1992, Physical review. A, Atomic, molecular, and optical physics.

[26] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[27] J. Urgen Schmidhuber,et al. Learning Factorial Codes by Predictability Minimization , 1992 .

[28] Narendra Ahuja,et al. Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[29] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[30] Michael C. Mozer,et al. A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[31] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .

[32] Jürgen Schmidhuber,et al. A ‘Self-Referential’ Weight Matrix , 1993 .

[33] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[34] Jürgen Schmidhuber,et al. Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[35] Xin Yao,et al. A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[36] Karl Sims,et al. Evolving virtual creatures , 1994, SIGGRAPH.

[37] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.

[38] Ronald J. Williams,et al. Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[39] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[40] Jürgen Schmidhuber,et al. Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[41] Jürgen Schmidhuber,et al. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[42] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[43] David E. Moriarty,et al. Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[44] Risto Miikkulainen,et al. Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[45] Corso Elvezia. Neural Predictors for Detecting and Removing Redundant Information , 1998 .

[46] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[47] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.

[48] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[49] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.

[50] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[51] Risto Miikkulainen,et al. Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.

[52] Sven Behnke,et al. Hierarchical Neural Networks for Image Interpretation , 2003, Lecture Notes in Computer Science.

[53] Risto Miikkulainen,et al. Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[54] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.

[55] Risto Miikkulainen,et al. Robust non-linear control through neuroevolution , 2003 .

[56] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[57] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[58] Jürgen Schmidhuber,et al. Optimal Ordered Problem Solver , 2002, Machine Learning.

[59] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[60] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[61] Jürgen Schmidhuber,et al. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[62] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[64] P. Vitányi,et al. An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[65] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[66] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[67] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[68] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.

[69] Jürgen Schmidhuber,et al. Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes (特集高次機能の学習と創発--脳・ロボット・人間研究における新たな展開) , 2009 .

[70] Kenneth O. Stanley,et al. A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[71] Julian Togelius,et al. Hierarchical controller learning in a First-Person Shooter , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[72] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[73] Tom Schaul,et al. Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients , 2010, ICANN.

[74] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.

[75] Sven Behnke,et al. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[76] Jürgen Schmidhuber,et al. Recurrent policy gradients , 2010, Log. J. IGPL.

[77] Jan Peters,et al. Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[78] Jürgen Schmidhuber,et al. Self-Delimiting Neural Networks , 2012, ArXiv.

[79] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[80] Jürgen Schmidhuber,et al. Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[81] Jürgen Schmidhuber,et al. First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[82] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[83] Tom Schaul,et al. A linear time natural evolution strategy for non-separable functions , 2011, GECCO.

[84] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[85] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86] Jürgen Schmidhuber,et al. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[87] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[88] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[89] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[90] Jürgen Schmidhuber,et al. A Wavelet-based Encoding for Neuroevolution , 2016, GECCO.

[91] Jürgen Schmidhuber,et al. Neural Expectation Maximization , 2017, NIPS.

[92] Glen Berseth,et al. Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.