论文信息 - Regret to the best vs. regret to the average

Regret to the best vs. regret to the average

AbstractWe study online regret minimization algorithms in an experts setting. In this setting, the algorithm chooses a distribution over experts at each time step and receives a gain that is a weighted average of the experts’ instantaneous gains. We consider a bicriteria setting, examining not only the standard notion of regret to the best expert, but also the regret to the average of all experts, the regret to any given fixed mixture of experts, or the regret to the worst expert. This study leads both to new understanding of the limitations of existing no-regret algorithms, and to new algorithms with novel performance guarantees. More specifically, we show that any algorithm that achieves only $O(\sqrt{T})$ cumulative regret to the best expert on a sequence of T trials must, in the worst case, suffer regret $\varOmega(\sqrt{T})$ to the average, and that for a wide class of update rules that includes many existing no-regret algorithms (such as Exponential Weights and Follow the Perturbed Leader), the product of the regret to the best and the regret to the average is, in the worst case, Ω(T). We then describe and analyze two alternate new algorithms that both achieve cumulative regret only $O(\sqrt{T}\log T)$ to the best expert and have only constant regret to any given fixed distribution over experts (that is, with no dependence on either T or the number of experts N). The key to the first algorithm is the gradual increase in the “aggressiveness” of updates in response to observed divergences in expert performances. The second algorithm is a simple twist on standard exponential-update algorithms.

[1] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[2] Neil D. Pearson,et al. Consumption and Portfolio Policies With Incomplete Markets and Short‐Sale Constraints: the Finite‐Dimensional Case , 1991 .

[3] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.

[4] T. Cover. Universal Portfolios , 1996 .

[5] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.

[6] Yoram Singer,et al. On‐Line Portfolio Selection Using Multiplicative Updates , 1998, ICML.

[7] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[8] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[9] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[10] Yishay Mansour,et al. Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[11] Yishay Mansour,et al. Regret to the Best vs. Regret to the Average , 2007, COLT.