An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
暂无分享,去创建一个
[1] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[2] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[3] P. Spreij. Probability and Measure , 1996 .
[4] Mark D. Pendrith,et al. Actual Return Reinforcement Learning versus Temporal Differences: Some Theoretical and Experimental Results , 1996, ICML.
[5] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[6] Sridhar Mahadevan,et al. Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning , 1996, ICML.
[7] Mark D. Pendrith,et al. An Analysis of non-Markov Automata Games: Implications for Reinforcement Learning , 1997 .
[8] K. Narendra,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[9] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[12] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[13] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[14] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[15] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[16] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..