Policy Gradient for Coherent Risk Measures
暂无分享,去创建一个
Shie Mannor | Mohammad Ghavamzadeh | Aviv Tamar | Yinlam Chow | Aviv Tamar | Shie Mannor | M. Ghavamzadeh | Yinlam Chow
[1] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[2] Marek Petrik,et al. Tight Approximations of Dynamic Risk Measures , 2011, Math. Oper. Res..
[3] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[6] Alexander Shapiro,et al. On a time consistency concept in risk averse multistage stochastic programming , 2009, Oper. Res. Lett..
[7] Abaxbank,et al. Spectral Measures of Risk : a Coherent Representation of Subjective Risk Aversion , 2002 .
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[10] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[11] Anthony V. Fiacco,et al. Introduction to Sensitivity and Stability Analysis in Nonlinear Programming , 2012 .
[12] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[13] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[14] B. Roorda,et al. COHERENT ACCEPTABILITY MEASURES IN MULTIPERIOD MODELS , 2005 .
[15] Josef Hadar,et al. Rules for Ordering Uncertain Prospects , 1969 .
[16] Shie Mannor,et al. Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..
[17] Marco Pavone,et al. A framework for time-consistent, risk-averse model predictive control: Theory and algorithms , 2014, 2014 American Control Conference.
[18] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[19] S. Rachev,et al. Stable Paretian Models in Finance , 2000 .
[20] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.
[21] Nicole Bäuerle,et al. Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..
[22] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[23] Dale Schuurmans,et al. Learning Exercise Policies for American Options , 2009, AISTATS.
[24] Marek Petrik,et al. An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.
[25] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[26] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[27] Gilles Pagès,et al. Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling , 2008, Monte Carlo Methods Appl..
[28] Paul R. Milgrom,et al. Envelope Theorems for Arbitrary Choice Sets , 2002 .
[29] A. Stuart,et al. Portfolio Selection: Efficient Diversification of Investments , 1959 .
[30] Fanwen Meng,et al. A Regularized Sample Average Approximation Method for Stochastic Mathematical Programs with Nonsmooth Equality Constraints , 2006, SIAM J. Optim..
[31] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[32] Jonas Schmitt. Portfolio Selection Efficient Diversification Of Investments , 2016 .
[33] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[34] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[35] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.
[36] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.
[37] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[38] Takayuki Osogami,et al. Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.
[39] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[40] Alexander Shapiro,et al. Optimization of Convex Risk Functions , 2006, Math. Oper. Res..
[41] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[42] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[43] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.