论文信息 - Continual learning in reinforcement environments - 字舞流文

Continual learning in reinforcement environments

Continual learning is the constant development of complex behaviors with no nal end in mind. It is the process of learning ever more complicated skills by building on those skills already developed. In order for learning at one stage of development to serve as the foundation for later learning, a continual-learning agent should learn hierarchically. CHILD, an agent capable of Continual, Hierarchical, Incremental Learning and Development is proposed, described, tested, and evaluated in this dissertation. CHILD accumulates useful behaviors in reinforcement environments by using the Temporal Transition Hierarchies learning algorithm, also derived in the dissertation. This constructive algorithm generates a hierarchical, higher-order neural network that can be used for predicting context-dependent temporal sequences and can learn sequential-task benchmarks more than two orders of magnitude faster than competing neural-network systems. Consequently, CHILD can quickly solve complicated non-Markovian reinforcement-learning tasks and can then transfer its skills to similar but even more complicated tasks, learning these faster still. This continual-learning approach is made possible by the unique properties of Temporal Transition Hierarchies, which allow existing skills to be amended and augmented in precisely the same way that they were constructed in the rst place. Table of

[1] I. Miller. Probability, Random Variables, and Stochastic Processes , 1966 .

[2] J. Albus. Mechanisms of planning and problem solving in the brain , 1979 .

[3] C. Roads,et al. The Handbook of Artificial Intelligence, Volume 1 , 1982 .

[4] Stephen Grossberg,et al. A Theory of Human Memory: Self-Organization and Performance of Sensory-Motor Codes, Maps, and Plans , 1982 .

[5] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[7] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[8] Charles W. Anderson,et al. Learning and problem-solving with multilayer connectionist systems (adaptive, strategy learning, neural networks, reinforcement learning) , 1986 .

[9] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[10] Robert E. Schapire,et al. A new approach to unsupervised learning in deterministic environments , 1990 .

[11] Colin Giles,et al. Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[12] Stewart W. Wilson. Hierarchical Credit Allocation in a Classifier System , 1987, IJCAI.

[13] Terrence J. Sejnowski,et al. Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[14] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[15] Benjamin Kuipers,et al. A Robust, Qualitative Method for Robot Spatial Learning , 1988, AAAI.

[16] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[17] C. Lee Giles,et al. Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[18] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[19] Ronald J. Williams,et al. Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[20] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[21] Michael C. Mozer,et al. A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[22] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[23] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[24] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.

[25] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[26] James L. McClelland,et al. Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[27] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[28] S. Fahlman. Fast-learning variations on back propagation: an empirical study. , 1989 .

[29] Alexander H. Waibel,et al. Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.

[30] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[31] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[32] Sebastian Thrun,et al. Planning with an Adaptive World Model , 1990, NIPS.

[33] Alexander H. Waibel,et al. The Tempo 2 Algorithm: Adjusting Time-Delays By Supervised Learning , 1990, NIPS.

[34] Alexander Linden,et al. Inversion of neural networks by gradient descent , 1990, Parallel Comput..

[35] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[36] Jürgen Schmidhuber,et al. Networks adjusting networks , 1990 .

[37] J. Urgen Schmidhuber. Making the World Di erentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning , 1990 .

[38] Scott E. Fahlman,et al. The Recurrent Cascade-Correlation Architecture , 1990, NIPS.

[39] Robert E. Schapire,et al. A new approach to unsupervised learning in deterministic environments , 1990 .

[40] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[41] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[42] Gerald Fahner,et al. A higher order unit that performs arbitrary Boolean functions , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[43] Marcus Frean,et al. The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[44] Stewart W. Wilson. The animat path to AI , 1991 .

[45] J. Urgen Schmidhuber,et al. Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[46] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[47] Jürgen Schmidhuber,et al. Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[48] Lambert E. Wixson,et al. Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[49] Lawrence Birnbaum,et al. Machine learning : proceedings of the Eighth International Workshop (ML91) , 1991 .

[50] H. L. Roitblat,et al. Cognitive action theory as a control architecture , 1991 .

[51] Terence D. Sanger,et al. A tree-structured adaptive network for function approximation in high-dimensional spaces , 1991, IEEE Trans. Neural Networks.

[52] C. Lee Giles,et al. Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[53] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.

[54] Richard S. Sutton,et al. Reinforcement learning architectures for animats , 1991 .

[55] Richard S. Sutton,et al. Iterative Construction of Sparse Polynomial Approximations , 1991, NIPS.

[56] Guo-Zheng Sun,et al. Green's Function Method for Fast On-Line Learning Algorithm of Recurrent Neural Networks , 1991, NIPS.

[57] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[58] C. Jutten,et al. Gal: Networks That Grow When They Learn and Shrink When They Forget , 1991 .

[59] Benjamin Kuipers,et al. Learning hill-climbing functions as a strategy for generating behaviors in a mobile robot , 1991 .

[60] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[61] Michael C. Mozer,et al. Induction of Multiscale Temporal Structure , 1991, NIPS.

[62] Alexander Linden. On Discontinuous Q-Functions in Reinforcment Learning , 1992, GWAI.

[63] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[64] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[65] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[66] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[67] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[68] Raymond L. Watrous,et al. Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[69] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.

[70] Michael C. Mozer,et al. A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[71] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[72] Kenji Doya,et al. Universality of Fully-Connected Recurrent Neural Networks , 1993 .

[73] Michael L. Littman,et al. An optimization-based categorization of reinforcement learning environments , 1993 .

[74] Michael R. Davenport,et al. Continuous-time temporal back-propagation with adaptable time delays , 1993, IEEE Trans. Neural Networks.

[75] Mark Ring. Sequence Learning with Incremental Higher-Order Neural Networks , 1993 .

[76] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[77] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.

[78] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .

[79] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[80] Frank Weber,et al. Implementing inner drive through competence reflection , 1993 .

[81] Mark Ring. Two methods for hierarchy learning in reinforcement environments , 1993 .

[82] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[83] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.

[84] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[85] C. L. Giles,et al. Constructive learning of recurrent neural networks , 1993, IEEE International Conference on Neural Networks.

[86] Rolf Eckmiller,et al. Structural adaptation of parsimonious higher-order neural classifiers , 1994, Neural Networks.

[87] Benjamin Kuipers,et al. Learning to Explore and Build Maps , 1994, AAAI.

[88] M. Goudreau,et al. First-order vs. Second-order Single Layer Recurrent Neural Networks , 1994 .

[89] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[90] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[91] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[92] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[93] Michael I. Jordan. Serial Order: A Parallel Distributed Processing Approach , 1997 .