暂无分享,去创建一个
[1] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[3] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[4] Jack L. Treynor,et al. MUTUAL FUND PERFORMANCE* , 2007 .
[5] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[6] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[7] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[8] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[9] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.
[10] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[12] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[15] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[16] J. Cockcroft. Investment in Science , 1962, Nature.
[17] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[18] R. Howard,et al. Risk-Sensitive Markov Decision Processes , 1972 .