论文信息 - A Tutorial Survey of Reinforcement Learn - 字舞流文

A Tutorial Survey of Reinforcement Learn

This paper gives a compact, self{contained tutorial survey of reinforcement learning, a tool that is increasingly nding application in the development o f i n telligent dynamic systems. Research on reinforcement learning during the past decade has led to the development of a variety of useful algorithms. This paper surveys the literature and presents the algorithms in a cohesive framework.

T. Keerthi | Balaraman Ravindran

[1] P. B. Coaker,et al. Applied Dynamic Programming , 1964 .

[2] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[3] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[4] R. Sutton,et al. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element , 1982, Behavioural Brain Research.

[5] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .

[7] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.

[8] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[9] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[10] P. Anandan,et al. Cooperativity in Networks of Pattern Recognizing Stochastic Learning Automata , 1986 .

[11] Rodney A. Brooks,et al. Achieving Artificial Intelligence through Building Robots , 1986 .

[12] Andrew G. Barto,et al. Game-theoretic cooperativity in networks of self-interested units , 1987 .

[13] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[14] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[15] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .

[16] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[17] A. Klopf. A neuronal model of classical conditioning , 1988 .

[18] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .

[19] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .

[20] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[21] A. Barto,et al. Learning and Sequential Decision Making , 1989 .

[22] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[23] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING , 1990 .

[24] Peter Dayan,et al. Navigating Through Temporal Difference , 1990, NIPS.

[25] Michael C. Mozer,et al. Discovering the Structure of a Reactive Environment by Exploration , 1990, Neural Computation.

[26] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..

[27] Michael I. Jordan,et al. A R-P learning applied to a network model of cortical area 7a , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[28] Richard S. Sutton,et al. Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming , 1990, NIPS 1990.

[29] David Chapman,et al. Vision, instruction, and action , 1990 .

[30] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[31] John C. Platt. Leaning by Combining Memorization and Gradient Descent , 1990, NIPS.

[32] Jacques J. Vidal,et al. Adaptive Range Coding , 1990, NIPS.

[33] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[34] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .

[35] Jonathan Bachrach,et al. A Connectionist Learning Control Architecture for Navigation , 1990, NIPS.

[36] Ming Tan,et al. Learning a Cost-Sensitive Internal Representation for Reinforcement Learning , 1991, ML.

[37] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .

[38] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[39] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.

[40] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.

[41] Carlos D. Brody,et al. Fast Learning with Predictive Forward Models , 1991, NIPS.

[42] V. Gullapalli,et al. A comparison of supervised and reinforcement learning methods on a reinforcement learning task , 1991, Proceedings of the 1991 IEEE International Symposium on Intelligent Control.

[43] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.

[44] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[45] Long Ji Lin,et al. Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[46] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.

[47] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.

[48] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.

[49] Satinder P. Singh,et al. Transfer of Learning Across Compositions of Sequentail Tasks , 1991, ML.

[50] P. Dayan. Reinforcing connectionism : learning the statistical way , 1991 .

[51] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[52] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[53] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[54] Michael P. Wellman,et al. Planning and Control , 1991 .

[55] Long Ji Lin,et al. Self-improvement Based on Reinforcement Learning, Planning and Teaching , 1991, ML.

[56] R.J. Williams,et al. Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[57] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[58] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[59] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[60] J. Millán,et al. A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.

[61] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[62] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[63] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[64] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.

[65] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .

[66] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[67] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[68] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[69] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[70] Sebastian Thrun,et al. Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.

[71] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.

[72] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[73] Peter D. Lawrence,et al. Transition Point Dynamic Programming , 1993, NIPS.

[74] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[75] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.

[76] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[77] B Ravindran,et al. A tutorial survey of reinforcement learning , 1994 .

[78] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[79] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[80] Steven J. Bradtke,et al. Incremental dynamic programming for on-line adaptive optimal control , 1995 .

[81] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[82] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[83] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[84] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.

[85] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.

[86] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[87] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[88] Richard S. Sutton,et al. Landmark learning: An illustration of associative search , 1981, Biological Cybernetics.

[89] Justin A. Boyan,et al. Modular Neural Networks for Learning Context-Dependent Game Strategies , 2007 .

[90] J. Walrand,et al. Distributed Dynamic Programming , 2022 .