论文信息 - Risk-Aversion in Multi-armed Bandits

Risk-Aversion in Multi-armed Bandits

Stochastic multi-armed bandits solve the Exploration-Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. This setting proves to be more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we define two algorithms, investigate their theoretical guarantees, and report preliminary empirical results.

[1] Manfred K. Warmuth,et al. Online variance minimization , 2011, Machine Learning.

[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3] Varun Grover,et al. Active learning in heteroscedastic noise , 2010, Theor. Comput. Sci..

[4] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.

[5] Jean-Yves Audibert,et al. Deviations of Stochastic Bandit Regret , 2011, ALT.

[6] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[7] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[8] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[9] P. Massart. The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[10] David B. Brown,et al. Large deviations bounds for estimating conditional value-at-risk , 2007, Oper. Res. Lett..

[11] Jennifer Wortman Vaughan,et al. Risk-Sensitive Online Learning , 2006, ALT.

[12] C. Gollier. The economics of risk and time , 2001 .

[13] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..

[14] Philippe Artzner,et al. Coherent Measures of Risk , 1999 .

[15] H. Robbins. Some aspects of the sequential design of experiments , 1952 .