To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning

[1]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[2]  Prasad Tadepalli,et al.  H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[3]  Satinder Singh,et al.  Learning to Solve Markovian Decision Processes , 1993 .

[4]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[5]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[6]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[7]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[8]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[9]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[10]  A. Jalali,et al.  Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[11]  C. Watkins Learning from delayed rewards , 1989 .

[12]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[13]  J. Schmee Applied Statistics—A Handbook of Techniques , 1984 .

[14]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  D. A. Bell,et al.  Applied Statistics , 1953, Nature.