论文信息 - On Tight Convergence Rates of Without-replacement SGD

On Tight Convergence Rates of Without-replacement SGD

For solving finite-sum optimization problems, SGD without replacement sampling is empirically shown to outperform SGD. Denoting by $n$ the number of components in the cost and $K$ the number of epochs of the algorithm , several recent works have shown convergence rates of without-replacement SGD that have better dependency on $n$ and $K$ than the baseline rate of $O(1/(nK))$ for SGD. However, there are two main limitations shared among those works: the rates have extra poly-logarithmic factors on $nK$, and denoting by $\kappa$ the condition number of the problem, the rates hold after $\kappa^c\log(nK)$ epochs for some $c>0$. In this work, we overcome these limitations by analyzing step sizes that vary across epochs.

Suvrit Sra | Kwangjun Ahn

[1] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[2] Prateek Jain,et al. SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.

[3] Marten van Dijk,et al. A Unified Convergence Analysis for Shuffling-Type Gradient Methods , 2020, ArXiv.

[4] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[5] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[6] Albert R. Meyer,et al. Mathematics for Computer Science , 2017 .

[7] Suvrit Sra,et al. Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[8] Dimitris Papailiopoulos,et al. Closing the convergence gap of SGD without replacement , 2020, ICML.

[9] L. Bottou. Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .

[10] A. Arnold,et al. Mathematics for computer science , 1996 .

[11] H. Robbins. A Stochastic Approximation Method , 1951 .

[12] Ohad Shamir,et al. How Good is SGD with Random Shuffling? , 2019, COLT 2019.

[13] V. Fabian. Stochastic Approximation of Minima with Improved Asymptotic Speed , 1967 .

[14] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.