Optimum Follow the Leader Algorithm

Consider the following setting for an on-line algorithm (introduced in [FS97]) that learns from a set of experts: In trial t the algorithm chooses an expert with probability p$^{t}_{i}$ . At the end of the trial a loss vector Lt∈[0,R]n for the n experts is received and an expected loss of ∑ip$^{t}_{i}$L$^{t}_{i}$ is incurred. A simple algorithm for this setting is the Hedge algorithm which uses the probabilities $p^{t}_{i} \sim exp^{-\eta L^{<t}_{i}}$. This algorithm and its analysis is a simple reformulation of the randomized version of the Weighted Majority algorithm (WMR) [LW94] which was designed for the absolute loss. The total expected loss of the algorithm is close to the total loss of the best expert $L_{*} = min_{i}L^{\leq T}_{i}$. That is, when the learning rate is optimally tuned based on L*, R and n, then the total expected loss of the Hedge/WMR algorithm is at most $$L_{*} + \sqrt{\bf 2}\sqrt{L_{*}R{\rm log} n} + O({\rm log} n)$$ The factor of $\sqrt{\bf 2}$ is in some sense optimal [Vov97].