Sequential Decision Making With Coherent Risk

We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as conditional value at risk and mean-semi-deviation. Our approach is suitable for problems in which tuneable parameters control the distribution of the cost, such as in reinforcement learning or approximate dynamic programming with a parameterized policy. Such problems cannot be solved using previous approaches. We consider both static risk measures and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient methods, while for the dynamic risk measures, we use actor-critic type algorithms.

[1]  Josef Hadar,et al.  Rules for Ordering Uncertain Prospects , 1969 .

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[3]  J. Hull Options, Futures, and Other Derivatives , 1989 .

[4]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Svetlozar T. Rachev,et al.  Stable Paretian modeling in finance: some empirical and theoretical aspects , 1998 .

[7]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[10]  Philippe Artzner,et al.  Coherent Measures of Risk , 1999 .

[11]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[12]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[13]  Vivek S. Borkar,et al.  A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..

[14]  Lex Weaver,et al.  A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[15]  Matthew Saffell,et al.  Learning to trade via direct reinforcement , 2001, IEEE Trans. Neural Networks.

[16]  Abaxbank,et al.  Spectral Measures of Risk : a Coherent Representation of Subjective Risk Aversion , 2002 .

[17]  Paul R. Milgrom,et al.  Envelope Theorems for Arbitrary Choice Sets , 2002 .

[18]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Garud Iyengar,et al.  Robust Dynamic Programming , 2005, Math. Oper. Res..

[21]  Michael Kearns,et al.  Reinforcement learning for optimized trade execution , 2006, ICML.

[22]  Michael C. Fu,et al.  Chapter 19 Gradient Estimation , 2006, Simulation.

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  Fanwen Meng,et al.  A Regularized Sample Average Approximation Method for Stochastic Mathematical Programs with Nonsmooth Equality Constraints , 2006, SIAM J. Optim..

[25]  Alexander Shapiro,et al.  Optimization of Convex Risk Functions , 2006, Math. Oper. Res..

[26]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[28]  Gilles Pagès,et al.  Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling , 2008, Monte Carlo Methods Appl..

[29]  Shalabh Bhatnagar,et al.  Natural actorcritic algorithms. , 2009 .

[30]  Dale Schuurmans,et al.  Learning Exercise Policies for American Options , 2009, AISTATS.

[31]  Andrzej Ruszczynski,et al.  Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[32]  Nicole Bäuerle,et al.  Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..

[33]  Takayuki Osogami,et al.  Robustness and risk-sensitivity in Markov decision processes , 2012, NIPS.

[34]  Marek Petrik,et al.  An Approximate Solution Method for Large Risk-Averse Markov Decision Processes , 2012, UAI.

[35]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[36]  Andrew J. Schaefer,et al.  Robust Modified Policy Iteration , 2013, INFORMS J. Comput..

[37]  Daniel Kuhn,et al.  Robust Markov Decision Processes , 2013, Math. Oper. Res..

[38]  H. Waelbroeck,et al.  Optimal Execution of Portfolio Transactions with Short‐Term Alpha , 2013 .

[39]  Mohammad Ghavamzadeh,et al.  Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[40]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[41]  Shie Mannor,et al.  Scaling Up Robust MDPs using Function Approximation , 2014, ICML.

[42]  Dieter Hendricks,et al.  A reinforcement learning extension to the Almgren-Chriss framework for optimal trade execution , 2014, 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[43]  Marco Pavone,et al.  A framework for time-consistent, risk-averse model predictive control: Theory and algorithms , 2014, 2014 American Control Conference.

[44]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[45]  Marek Petrik,et al.  Tight Approximations of Dynamic Risk Measures , 2011, Math. Oper. Res..

[46]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[47]  Bruno Scherrer,et al.  Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[48]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[49]  Jonas Schmitt Portfolio Selection Efficient Diversification Of Investments , 2016 .

[50]  Marco Pavone,et al.  A Framework for Time-Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms , 2019, IEEE Transactions on Automatic Control.