On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

This paper addresses the general problem of reinforcement learning (RL) in partially observable environments. In 2013, our large RL recurrent neural networks (RNNs) learned from scratch to drive simulated cars from high-dimensional video input. However, real brains are more powerful in many ways. In particular, they learn a predictive model of their initially unknown environment, and somehow use it for abstract (e.g., hierarchical) planning and reasoning. Guided by algorithmic information theory, we describe RNN-based AIs (RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending sequences of tasks, some of them provided by the user, others invented by the RNNAI itself in a curious, playful fashion, to improve its RNN-based world model. Unlike our previous model-building RNN-based RL machines dating back to 1990, the RNNAI learns to actively query its model for abstract reasoning and planning and decision making, essentially "learning to think." The basic ideas of this report can be applied to many other cases where one RNN-like system exploits the algorithmic information content of another. They are taken from a grant proposal submitted in Fall 2014, and also explain concepts such as "mirror neurons." Experimental results will be described in separate papers.

[1]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[2]  K. Gödel Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I , 1931 .

[3]  Emil L. Post Finite combinatory processes—formulation , 1936, Journal of Symbolic Logic.

[4]  A. Church An Unsolvable Problem of Elementary Number Theory , 1936 .

[5]  A. Turing On computable numbers, with an application to the Entscheidungsproblem , 1937, Proc. London Math. Soc..

[6]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[8]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[9]  S. Dreyfus The numerical solution of variational problems , 1962 .

[10]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[11]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[12]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[13]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[14]  Alekseĭ Grigorʹevich Ivakhnenko,et al.  CYBERNETIC PREDICTING DEVICES , 1966 .

[15]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[16]  H. Akaike Statistical predictor identification , 1970 .

[17]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[18]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[19]  H. Akaike A new look at the statistical model identification , 1974 .

[20]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[21]  B. Speelpenning Compiling Fast Partial Derivatives of Functions Given by Algorithms , 1980 .

[22]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[23]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[25]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[26]  Rina Dechter,et al.  Learning While Searching in Constraint-Satisfaction-Problems , 1986, AAAI.

[27]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[29]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[30]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[31]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[32]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[33]  Michael I. Jordan Supervised learning and systems with excess degrees of freedom , 1988 .

[34]  John E. Moody,et al.  Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.

[35]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[36]  Stephen I. Gallant,et al.  Connectionist expert systems , 1988, CACM.

[37]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[38]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[39]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[40]  Frank Fallside,et al.  Dynamic reinforcement driven error propagation networks with application to game playing , 1989 .

[41]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[42]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[43]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[44]  P. J. Werbos,et al.  Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.

[45]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[46]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[47]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[48]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[49]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[50]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[51]  T. Sejnowski,et al.  Learning Algorithms for Networks with Internal and External Feedback , 1990 .

[52]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[53]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[54]  Isabelle Guyon,et al.  Structural Risk Minimization for Character Recognition , 1991, NIPS.

[55]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[56]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[57]  Jürgen Schmidhuber,et al.  Learning to generate sub-goals for action sequences , 1991 .

[58]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[59]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[60]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[61]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[62]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[63]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[64]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[65]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[66]  D. Mackay,et al.  A Practical Bayesian Framework for Backprop Networks , 1991 .

[67]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[68]  Osamu Watanabe,et al.  Kolmogorov Complexity and Computational Complexity , 2012, EATCS Monographs on Theoretical Computer Science.

[69]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[70]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[71]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[72]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[73]  E. Allender Applications of Time-Bounded Kolmogorov Complexity in Complexity Theory , 1992 .

[74]  Narendra Ahuja,et al.  Cresceptron: a self-organizing neural network which grows adaptively , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[75]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[76]  Fu-Chuang Chen,et al.  Adaptive control of nonlinear systems using neural networks , 1992 .

[77]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[78]  Mark B. Ring Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[79]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[80]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[81]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[82]  Geoffrey E. Hinton,et al.  Keeping Neural Networks Simple , 1993 .

[83]  Andrzej Cichocki,et al.  Neural networks for optimization and signal processing , 1993 .

[84]  Inman Harvey,et al.  Evolving Recurrent Dynamical Networks for Robot Control , 1993 .

[85]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[86]  Mitsuo Kawato,et al.  Neural network control for a closed-loop System using Feedback-error-learning , 1993, Neural Networks.

[87]  Shun-ichi Amari,et al.  Statistical Theory of Learning Curves under Entropic Loss Criterion , 1993, Neural Computation.

[88]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[89]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[90]  Jürgen Schmidhuber,et al.  Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[91]  Xin Yao,et al.  A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[92]  Jonas Karlsson,et al.  Learning via task decomposition , 1993 .

[93]  Kumpati S. Narendra,et al.  Control of nonlinear dynamical systems using neural networks: controllability and stabilization , 1993, IEEE Trans. Neural Networks.

[94]  David H. Wolpert,et al.  Bayesian Backpropagation Over I-O Functions Rather Than Weights , 1993, NIPS.

[95]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[96]  Sean B. Holden,et al.  On the theory of generalization and self-structuring in linearly weighted connectionist networks , 1993 .

[97]  Vasant Honavar,et al.  Generative learning structures and processes for generalized connectionist networks , 1993, Inf. Sci..

[98]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[99]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[100]  Astro Teller,et al.  The evolution of mental models , 1994 .

[101]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[102]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[103]  Randall D. Beer,et al.  Sequential Behavior and Learning in Evolved Dynamical Neural Networks , 1994, Adapt. Behav..

[104]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[105]  Stefano Nolfi,et al.  How to Evolve Autonomous Robots: Different Approaches in Evolutionary Robotics , 1994 .

[106]  Neil Burgess,et al.  A Constructive Algorithm that Converges for Real-Valued Input Patterns , 1994, Int. J. Neural Syst..

[107]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[108]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[109]  Christian Jacob,et al.  Genetic L-System Programming , 1994, PPSN.

[110]  Gerhard Weiß,et al.  Hierarchical Chunking in Classifier Systems , 1994, AAAI.

[111]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[112]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[113]  John Moody,et al.  Architecture Selection Strategies for Neural Networks: Application to Corporate Bond Rating Predicti , 1995, NIPS 1995.

[114]  Corso Elvezia Discovering Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1995 .

[115]  Roland Olsson,et al.  Inductive Functional Programming Using Incremental Program Transformation , 1995, Artif. Intell..

[116]  J. Stephen Judd,et al.  Optimal stopping and effective machine complexity in learning , 1993, Proceedings of 1995 IEEE International Symposium on Information Theory.

[117]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[118]  Stefano Nolfi,et al.  Evolving Mobile Robots in Simulated and Real Environments , 1995, Artificial Life.

[119]  Rajesh Parekh,et al.  Constructive Neural Network Learning Algorithms for Multi-Category Pattern Classification , 1995 .

[120]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[121]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[122]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[123]  K S Narendra,et al.  Control of nonlinear dynamical systems using neural networks. II. Observability, identification, and control , 1996, IEEE Trans. Neural Networks.

[124]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[125]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[126]  Andrew McCallum,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[127]  Jürgen Schmidhuber,et al.  Sequential neural text compression , 1996, IEEE Trans. Neural Networks.

[128]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[129]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[130]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[131]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[132]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[133]  Jürgen Schmidhuber,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[134]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[135]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[136]  David E. Moriarty,et al.  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[137]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[138]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[139]  Shigenobu Kobayashi,et al.  Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.

[140]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[141]  Risto Miikkulainen,et al.  Incremental Evolution of Complex General Behavior , 1997, Adapt. Behav..

[142]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[143]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[144]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[145]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[146]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[147]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[148]  F. Pasemann,et al.  Evolving structure and function of neurocontrollers , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[149]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[150]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[151]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[152]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[153]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[154]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[155]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[156]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[157]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[158]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[159]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[160]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[161]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[162]  Tao Zhang,et al.  Stable Adaptive Neural Network Control , 2001, The Springer International Series on Asian Studies in Computer and Information Science.

[163]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[164]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[165]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[166]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[167]  Ofi rNw8x'pyzm,et al.  The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002 .

[168]  Jürgen Schmidhuber,et al.  Hierarchies of Generalized Kolmogorov Complexities and Nonenumerable Universal Measures Computable in the Limit , 2002, Int. J. Found. Comput. Sci..

[169]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[170]  G. Rizzolatti,et al.  Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons , 2002, Science.

[171]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[172]  Christian Igel,et al.  Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[173]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[174]  Risto Miikkulainen,et al.  Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.

[175]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[176]  Sven Behnke,et al.  Hierarchical Neural Networks for Image Interpretation , 2003, Lecture Notes in Computer Science.

[177]  Jürgen Schmidhuber,et al.  A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[178]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[179]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[180]  Risto Miikkulainen,et al.  Robust non-linear control through neuroevolution , 2003 .

[181]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[182]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[183]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[184]  Jürgen Schmidhuber,et al.  Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets , 2003, Neural Networks.

[185]  Douglas Aberdeen,et al.  Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .

[186]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[187]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[188]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[189]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[190]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[191]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[192]  Jürgen Schmidhuber,et al.  Optimal Ordered Problem Solver , 2002, Machine Learning.

[193]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[194]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[195]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[196]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[197]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[198]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[199]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[200]  Jürgen Schmidhuber,et al.  Fast Online Q(λ) , 1998, Machine Learning.

[201]  Chia-Feng Juang,et al.  A hybrid of genetic algorithm and particle swarm optimization for recurrent network design , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[202]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[203]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[204]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[205]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[206]  Mark A. Pitt,et al.  Advances in Minimum Description Length: Theory and Applications , 2005 .

[207]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[208]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[209]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[210]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[211]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[212]  Bram Bakker,et al.  Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[213]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[214]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[215]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[216]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[217]  Jürgen Schmidhuber,et al.  A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[218]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[219]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[220]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[221]  Justus H. Piater,et al.  Closed-Loop Learning of Visual Control Policies , 2011, J. Artif. Intell. Res..

[222]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[223]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[224]  P. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[225]  R. Sutton,et al.  A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[226]  Steffen Udluft,et al.  Learning long-term dependencies with recurrent neural networks , 2008, Neurocomputing.

[227]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[228]  Risto Miikkulainen,et al.  Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[229]  M. Graziano The Intelligent Movement Machine: An Ethological Perspective on the Primate Motor System , 2008 .

[230]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[231]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[232]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[233]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[234]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[235]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[236]  Jürgen Schmidhuber,et al.  Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes (特集 高次機能の学習と創発--脳・ロボット・人間研究における新たな展開) , 2009 .

[237]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[238]  Julian Togelius,et al.  Hierarchical controller learning in a First-Person Shooter , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[239]  Verena Heidrich-Meisner,et al.  Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[240]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[241]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[242]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[243]  Tom Schaul,et al.  Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients , 2010, ICANN.

[244]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[245]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[246]  Junichiro Yoshimoto,et al.  Free-energy-based reinforcement learning in a partially observable environment , 2010, ESANN.

[247]  Richard S. Sutton,et al.  GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.

[248]  Robert A. Legenstein,et al.  Reinforcement Learning on Slow Features of High-Dimensional Input Streams , 2010, PLoS Comput. Biol..

[249]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[250]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[251]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[252]  Jan Peters,et al.  Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[253]  R. Sutton,et al.  GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .

[254]  Jürgen Schmidhuber,et al.  On Fast Deep Nets for AGI Vision , 2011, AGI.

[255]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[256]  Luca Maria Gambardella,et al.  Convolutional Neural Network Committees for Handwritten Character Classification , 2011, 2011 International Conference on Document Analysis and Recognition.

[257]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[258]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[259]  Luca Maria Gambardella,et al.  Better Digit Recognition with a Committee of Simple Neural Nets , 2011, 2011 International Conference on Document Analysis and Recognition.

[260]  Faustino J. Gomez,et al.  Intrinsically Motivated Evolutionary Search for Vision-Based Reinforcement Learning , 2011 .

[261]  Jürgen Schmidhuber,et al.  A committee of neural networks for traffic sign classification , 2011, The 2011 International Joint Conference on Neural Networks.

[262]  Jürgen Schmidhuber,et al.  Sequential Constant Size Compressors for Reinforcement Learning , 2011, AGI.

[263]  Tom Schaul,et al.  The two-dimensional organization of behavior , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[264]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[265]  Jürgen Schmidhuber,et al.  Self-Delimiting Neural Networks , 2012, ArXiv.

[266]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[267]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[268]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[269]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[270]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[271]  Shimon Whiteson,et al.  Evolutionary Computation for Reinforcement Learning , 2012, Reinforcement Learning.

[272]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[273]  Yee Whye Teh,et al.  Actor-Critic Reinforcement Learning with Energy-Based Policies , 2012, EWRL.

[274]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[275]  Jürgen Schmidhuber,et al.  Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.

[276]  Joos Vandewalle,et al.  Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications , 2012 .

[277]  Hans-Georg Zimmermann,et al.  Forecasting with Recurrent Neural Networks: 12 Tricks , 2012, Neural Networks: Tricks of the Trade.

[278]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[279]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[280]  Jürgen Schmidhuber,et al.  An intrinsic value system for developing multiple invariant representations with incremental slowness learning , 2013, Front. Neurorobot..

[281]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[282]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[283]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[284]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[285]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[286]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[287]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[288]  Tom Schaul,et al.  A linear time natural evolution strategy for non-separable functions , 2011, GECCO.

[289]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[290]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[291]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[292]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[293]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[294]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[295]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[296]  Bhuvana Ramabhadran,et al.  Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.

[297]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[298]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[299]  Frank K. Soong,et al.  TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.

[300]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[301]  Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 , 2014, ICML.

[302]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[303]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[304]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[305]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[306]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[307]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[308]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[309]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.