Learning the Variance of the Reward-To-Go
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[4] John N. Tsitsiklis,et al. Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.
[5] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[6] N. D. Yen. Lipschitz Continuity of Solutions of Variational Inequalities with a Parametric Polyhedral Constraint , 1995, Math. Oper. Res..
[7] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[8] P. Olver. Nonlinear Systems , 2013 .
[9] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[10] Jack L. Treynor,et al. MUTUAL FUND PERFORMANCE* , 2007 .
[11] Ali Esmaili,et al. Probability and Random Processes , 2005, Technometrics.
[12] Klaus Obermayer,et al. Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.
[13] John N. Tsitsiklis,et al. Algorithmic aspects of mean-variance optimization in Markov decision processes , 2013, Eur. J. Oper. Res..
[14] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[15] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .
[16] D. Duffie. Dynamic Asset Pricing Theory , 1992 .
[17] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .
[18] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[19] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[20] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.
[21] Dimitri P. Bertsekas,et al. Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.
[22] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .
[23] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[24] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.
[25] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[26] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.
[27] Dale Schuurmans,et al. Learning Exercise Policies for American Options , 2009, AISTATS.
[28] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[29] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..
[30] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[31] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[32] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[33] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[34] S. Ross,et al. Option pricing: A simplified approach☆ , 1979 .
[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[36] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[37] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[38] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[39] Francis A. Longstaff,et al. Valuing American Options by Simulation: A Simple Least-Squares Approach , 2001 .
[40] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[41] Shie Mannor,et al. Temporal Difference Methods for the Variance of the Reward To Go , 2013, ICML.
[42] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.