To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning
暂无分享,去创建一个
[1] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[2] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .
[3] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[4] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[5] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[6] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[7] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[8] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[9] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[10] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[11] C. Watkins. Learning from delayed rewards , 1989 .
[12] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[13] J. Schmee. Applied Statistics—A Handbook of Techniques , 1984 .
[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[15] D. A. Bell,et al. Applied Statistics , 1953, Nature.