论文信息 - To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning - 字舞流文

To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning

Sridhar Mahadevan | S. Mahadevan

[1] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[2] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[3] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .

[4] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[5] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[6] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[7] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[8] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[9] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[10] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[11] C. Watkins. Learning from delayed rewards , 1989 .

[12] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[13] J. Schmee. Applied Statistics—A Handbook of Techniques , 1984 .

[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15] D. A. Bell,et al. Applied Statistics , 1953, Nature.