论文信息 - Practical Risk Measures in Reinforcement Learning

Practical Risk Measures in Reinforcement Learning

Practical application of Reinforcement Learning (RL) often involves risk considerations. We study a generalized approximation scheme for risk measures, based on Monte-Carlo simulations, where the risk measures need not necessarily be \emph{coherent}. We demonstrate that, even in simple problems, measures such as the variance of the reward-to-go do not capture the risk in a satisfactory manner. In addition, we show how a risk measure can be derived from model's realizations. We propose a neural architecture for estimating the risk and suggest the risk critic architecture that can be use to optimize a policy under general risk measures. We conclude our work with experiments that demonstrate the efficacy of our approach.

Shie Mannor | Joel Oren | Dotan Di Castro | Shie Mannor | J. Oren

[1] Pieter Abbeel,et al. Safe Exploration in Markov Decision Processes , 2012, ICML.

[2] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .

[3] Fernando Paganini,et al. IEEE Transactions on Automatic Control , 2006 .

[4] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[5] M. D. Wilkinson,et al. Management science , 1989, British Dental Journal.

[6] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[8] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .

[9] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[10] E. Altman. Constrained Markov Decision Processes , 1999 .

[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13] David M. W. Powers,et al. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[14] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16] Joelle Pineau,et al. Proceedings of the Twenty-Ninth International Conference on Machine Learning , 2012 .

[17] Daniel Gooch,et al. Communications of the ACM , 2011, XRDS.

[18] D. Krass,et al. Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[19] R. Rosenfeld. Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[21] Sven Koenig,et al. Functional Value Iteration for Decision-Theoretic Planning with General Utility Functions , 2006, AAAI.

[22] Marko Bacic,et al. Model predictive control , 2003 .

[23] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[24] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[25] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[26] David Q. Mayne,et al. Model predictive control: Recent developments and future promise , 2014, Autom..

[27] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.

[28] Kathleen Daly. Volume 7 , 1998 .

[29] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[30] Makoto Sato,et al. TD algorithm for the variance of return and mean-variance reinforcement learning , 2001 .

[31] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.