Risk-Aversion in Multi-armed Bandits
暂无分享,去创建一个
[1] Manfred K. Warmuth,et al. Online variance minimization , 2011, Machine Learning.
[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[3] Varun Grover,et al. Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..
[4] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[5] Jean-Yves Audibert,et al. Deviations of Stochastic Bandit Regret , 2011, ALT.
[6] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[7] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[8] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[9] P. Massart. The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .
[10] David B. Brown,et al. Large deviations bounds for estimating conditional value-at-risk , 2007, Oper. Res. Lett..
[11] Jennifer Wortman Vaughan,et al. Risk-Sensitive Online Learning , 2006, ALT.
[12] C. Gollier. The economics of risk and time , 2001 .
[13] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[14] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .
[15] H. Robbins. Some aspects of the sequential design of experiments , 1952 .