Actual Return Reinforcement Learning versus Temporal Differences: Some Theoretical and Experimental Results
暂无分享,去创建一个
[1] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[2] Claude Sammut,et al. Recent progress with BOXES , 1994, Machine Intelligence 13.
[3] Mark D. Pendrith. On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains , 1994 .
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning , 1995 .
[6] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[7] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[8] Rodney A. Brooks,et al. Intelligence Without Reason , 1991, IJCAI.
[9] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[10] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[11] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[12] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[13] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.