论文信息 - Sequential Decision Making With Coherent Risk

Sequential Decision Making With Coherent Risk

We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as conditional value at risk and mean-semi-deviation. Our approach is suitable for problems in which tuneable parameters control the distribution of the cost, such as in reinforcement learning or approximate dynamic programming with a parameterized policy. Such problems cannot be solved using previous approaches. We consider both static risk measures and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient methods, while for the dynamic risk measures, we use actor-critic type algorithms.

[1] Josef Hadar,et al. Rules for Ordering Uncertain Prospects , 1969 .

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .

[3] J. Hull. Options, Futures, and Other Derivatives , 1989 .

[4] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6] Svetlozar T. Rachev,et al. Stable Paretian modeling in finance: some empirical and theoretical aspects , 1998 .

[7] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.

[10] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .

[11] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[12] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[13] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..

[14] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[15] Matthew Saffell,et al. Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[16] Abaxbank,et al. Spectral Measures of Risk : a Coherent Representation of Subjective Risk Aversion , 2002 .

[17] Paul R. Milgrom,et al. Envelope Theorems for Arbitrary Choice Sets , 2002 .

[18] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..

[21] Michael Kearns,et al. Reinforcement learning for optimized trade execution , 2006, ICML.

[22] Michael C. Fu,et al. Chapter 19 Gradient Estimation , 2006, Simulation.

[23] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24] Fanwen Meng,et al. A Regularized Sample Average Approximation Method for Stochastic Mathematical Programs with Nonsmooth Equality Constraints , 2006, SIAM J. Optim..

[25] Alexander Shapiro,et al. Optimization of Convex Risk Functions , 2006, Math. Oper. Res..

[26] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..

[28] Gilles Pagès,et al. Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling , 2008, Monte Carlo Methods Appl..

[29] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .

[30] Dale Schuurmans,et al. Learning Exercise Policies for American Options , 2009, AISTATS.

[31] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[32] Nicole Bäuerle,et al. Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..

[33] Takayuki Osogami,et al. Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[34] Marek Petrik,et al. An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.

[35] Shie Mannor,et al. Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[36] Andrew J. Schaefer,et al. Robust Modified Policy Iteration , 2013, INFORMS J. Comput..

[37] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..

[38] H. Waelbroeck,et al. Optimal Execution of Portfolio Transactions with Short‐Term Alpha , 2013 .

[39] Mohammad Ghavamzadeh,et al. Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[40] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[41] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.

[42] Dieter Hendricks,et al. A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution , 2014, 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[43] Marco Pavone,et al. A framework for time-consistent, risk-averse model predictive control: Theory and algorithms , 2014, 2014 American Control Conference.

[44] Mohammad Ghavamzadeh,et al. Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[45] Marek Petrik,et al. Tight Approximations of Dynamic Risk Measures , 2011, Math. Oper. Res..

[46] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.

[47] Bruno Scherrer,et al. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[48] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[49] Jonas Schmitt. Portfolio Selection Efficient Diversification Of Investments , 2016 .

[50] Marco Pavone,et al. A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms , 2019, IEEE Transactions on Automatic Control.