论文信息 - Two steps to risk sensitivity - 字舞流文

Two steps to risk sensitivity

Distributional reinforcement learning (RL) – in which agents learn about all the possible long-term consequences of their actions, and not just the expected value – is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e.g., vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR that enjoy this desirable characteristic. We use simulations to examine settings in which the various forms differ in ways that have implications for human and animal planning and behavior.

Peter Dayan | Chris Gagne | P. Dayan | Christian Gagné

[1] Steven I. Marcus,et al. Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[2] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.

[3] Thomas Mazzoni,et al. Consistent modeling of risk averse behavior with spectral risk measures , 2013, Eur. J. Oper. Res..

[4] Zeb Kurth-Nelson,et al. A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.

[5] Georg Ch. Pflug,et al. Time-Consistent Decisions and Temporal Decomposition of Coherent Risk Functionals , 2016, Math. Oper. Res..

[6] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..

[7] Berend Roorda,et al. Time Consistency Conditions for Acceptability Measures, with an Application to Tail Value at Risk , 2007 .

[8] Angela J. Yu,et al. Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[9] P. Dayan,et al. Peril, Prudence and Planning as Risk, Avoidance and Worry , 2021, Journal of Mathematical Psychology.

[10] Marco Pavone,et al. How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics , 2017, ISRR.

[11] Martin Schneider,et al. Recursive multiple-priors , 2003, J. Econ. Theory.

[12] Alexander Shapiro,et al. Risk-Averse Stochastic Programming: Time Consistency and Optimal Stopping , 2018, Oper. Res..

[13] Alexandre Street,et al. Time consistency and risk averse dynamic decision models: Definition, interpretation and practical consequences , 2014, Eur. J. Oper. Res..

[14] Jerzy A. Filar,et al. Time Consistent Dynamic Risk Measures , 2006, Math. Methods Oper. Res..

[15] M. Yaari. The Dual Theory of Choice under Risk , 1987 .

[16] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[17] Michel De Lara,et al. Building up time-consistency for risk measures and dynamic optimization , 2016, Eur. J. Oper. Res..

[18] A. Imas. The Realization Effect: Risk-Taking after Realized versus Paper Losses , 2016 .

[19] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[20] Alexander Shapiro,et al. Conditional Risk Mappings , 2005, Math. Oper. Res..

[21] Y. H. Farzin,et al. On Hyperbolic Discounting , 2019 .

[22] G. Pflug,et al. Modeling, Measuring and Managing Risk , 2008 .

[23] Jochen Gönsch,et al. Time-Consistent, Risk-Averse Dynamic Pricing , 2019, Eur. J. Oper. Res..

[24] Nicole Bäuerle,et al. Markov Decision Processes with Average-Value-at-Risk criteria , 2011, Math. Methods Oper. Res..

[25] Shaun S. Wang. A CLASS OF DISTORTION OPERATORS FOR PRICING FINANCIAL AND INSURANCE RISKS , 2000 .

[26] Colin Camerer,et al. Not So Different After All: A Cross-Discipline View Of Trust , 1998 .

[27] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .

[28] Marc Rigter,et al. Risk-Averse Bayes-Adaptive Reinforcement Learning , 2021, NeurIPS.

[29] Peter Dayan,et al. Decision-Theoretic Psychiatry , 2015 .

[30] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[31] Gregory L. Stuart,et al. Evaluation of a behavioral measure of risk taking: the Balloon Analogue Risk Task (BART). , 2002, Journal of experimental psychology. Applied.

[32] Marco Pavone,et al. Robust and Adaptive Planning under Model Uncertainty , 2019, ICAPS.

[33] Andrzej Ruszczynski,et al. Dynamic Risk Measures for Finite-State Partially Observable Markov Decision Problems , 2015, SIAM Conf. on Control and its Applications.

[34] Alexander Shapiro,et al. On a time consistency concept in risk averse multistage stochastic programming , 2009, Oper. Res. Lett..

[35] A. Krause,et al. Risk-Averse Offline Reinforcement Learning , 2021, ICLR.

[36] N. Daw,et al. Characterizing a psychiatric symptom dimension related to deficits in goal-directed control , 2016, eLife.

[37] Christoph Dann,et al. Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy , 2020, AAAI.

[38] David Heath,et al. Coherent multiperiod risk adjusted values and Bellman’s principle , 2007, Ann. Oper. Res..

[39] Abaxbank,et al. Spectral Measures of Risk : a Coherent Representation of Subjective Risk Aversion , 2002 .

[40] Matthias Brand,et al. Neuropsychological correlates of decision-making in ambiguous and risky situations , 2006, Neural Networks.

[41] Shaun S. Wang. Premium Calculation by Transforming the Layer Premium Density , 1996, ASTIN Bulletin.

[42] J. H. Davis,et al. An Integrative Model Of Organizational Trust , 1995 .

[43] P. Dayan,et al. Increased decision thresholds trigger extended information gathering across the compulsivity spectrum , 2017, Translational Psychiatry.

[44] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[45] A. Tversky,et al. Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[46] T. Robbins,et al. Reflection Impulsivity in Current and Former Substance Users , 2006, Biological Psychiatry.

[47] A. Tversky,et al. Prospect theory: an analysis of decision under risk — Source link , 2007 .

[48] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[49] Shie Mannor,et al. Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[50] Aaron D. Ames,et al. Risk-Averse Planning Under Uncertainty , 2019, 2020 American Control Conference (ACC).

[51] R. Rockafellar,et al. Conditional Value-at-Risk for General Loss Distributions , 2001 .

[52] Shie Mannor,et al. Optimizing the CVaR via Sampling , 2014, AAAI.

[53] Evan L. Porteus,et al. Temporal von neumann-morgenstern and induced preferences , 1979 .