Temporal Difference Methods for the Variance of the Reward To Go
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[2] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[3] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..
[4] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[5] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[6] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[7] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[12] Ralph Neuneier,et al. Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.
[13] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[14] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[15] Jack L. Treynor,et al. MUTUAL FUND PERFORMANCE* , 2007 .
[16] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[17] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[18] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[19] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[20] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[21] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[22] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[23] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.
[24] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[25] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..