Reinforcement learning with replacing eligibility traces
暂无分享,去创建一个
[1] W. Wasow. A note on the inversion of matrices by random walks , 1952 .
[2] A. H. Klopf,et al. Brain Function and Adaptive Systems: A Heterostatic Theory , 1972 .
[3] Allen Van Gelder,et al. Computer Algorithms: Introduction to Design and Analysis , 1978 .
[4] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[7] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[8] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[9] W. T. Miller,et al. CMAC: an associative neural network alternative to backpropagation , 1990, Proc. IEEE.
[10] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.
[11] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[12] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[13] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[14] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[15] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[16] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[17] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[18] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[19] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[20] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.
[21] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[22] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[23] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[24] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[25] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[26] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[27] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.