IDSIA-1106 General Discounting versus Average Reward
暂无分享,去创建一个
[1] Marcus Hutter,et al. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .
[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[3] Konstantin Avrachenkov,et al. Sensitive discount optimality via nested linear programs for ergodic Markov decision processes , 1999, 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251).
[4] F. Kelly. Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .
[5] P. Samuelson. A Note on Measurement of Utility , 1937 .
[6] Stuart J. Russell,et al. Artificial Intelligence , 1999 .
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .
[9] R. H. Strotz. Myopia and Inconsistency in Dynamic Utility Maximization , 1955 .
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Shanefrederick,et al. Time Discounting and Time Preference : A Critical Review , 2022 .
[12] Sridhar Mahadevan,et al. Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning , 1996, ICML.