The convergence of TD(λ) for general λ
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] J. Gillis,et al. Matrix Iterative Analysis , 1961 .
[3] E. Feigenbaum,et al. Computers and Thought , 1963 .
[4] P. B. Coaker,et al. Applied Dynamic Programming , 1964 .
[5] R. Bellman. Dynamic programming. , 1957, Science.
[6] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[7] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[8] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[9] S. Ostrach,et al. Heat Transfer Augmentation in Laminar Fully Developed Channel Flow by Means of Heating From Below , 1975 .
[10] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[11] John S. Edwards,et al. The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .
[12] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[13] Steven Edward Hampson,et al. A neural model of adaptive behavior , 1983 .
[14] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[15] John R. Anderson,et al. Machine learning - an artificial intelligence approach , 1982, Symbolic computation.
[16] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[17] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[18] Bart W. Stuck,et al. A Computer and Communication Network Performance Analysis Primer (Prentice Hall, Englewood Cliffs, NJ, 1985; revised, 1987) , 1987, Int. CMG Conference.
[19] Stephen M. Omohundro,et al. Efficient Algorithms with Neural Network Behavior , 1987, Complex Systems.
[20] J. W. Moore. Learning and Sequential Decision Making , 1989 .
[21] S. Hampson. Connectionistic Problem Solving: Computational Aspects of Biological Learning Steven E. Hampson Birkhäuser, 1990. Sw. fr. 78.00 (iv + 276 pages) ISBN 3 7643 3450 9 , 1990, Trends in Neurosciences.
[22] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[23] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[24] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[25] P. Dayan. Reinforcing connectionism : learning the statistical way , 1991 .
[26] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[27] Richard S. Sutton,et al. Learning to Predict by the Methods of Temporal Differences , 1988, Machine Learning.