暂无分享,去创建一个
[1] Alessandro Lazaric,et al. Risk-Aversion in Multi-armed Bandits , 2012, NIPS.
[2] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[3] P. Massart. The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .
[4] Evan Fisher. On the Law of the Iterated Logarithm for Martingales , 1992 .
[5] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[6] Michèle Sebag,et al. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.
[7] Qing Zhao,et al. Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure , 2016, IEEE Journal of Selected Topics in Signal Processing.
[8] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[9] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[10] Krishnendu Chatterjee,et al. Generalized Risk-Aversion in Stochastic Multi-Armed Bandits , 2014, ArXiv.
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[13] Odalric-Ambrym Maillard,et al. Robust Risk-Averse Stochastic Multi-armed Bandits , 2013, ALT.
[14] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[15] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.