Optimal Rates for Random Order Online Optimization

We study online convex optimization in the random order model, recently proposed by Garber et al. [8], where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give algorithms that achieve the optimal bounds and significantly outperform the results of Garber et al. [8], completely removing the dimension dependence and improving their scaling with respect to the strong convexity parameter. Our analysis relies on novel connections between algorithmic stability and generalization for sampling without-replacement analogous to those studied in the with-replacement i.i.d. setting, as well as on a refined average stability analysis of stochastic gradient descent.

[1]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[2]  Jan Vondrák,et al.  High probability generalization bounds for uniformly stable algorithms with nearly optimal rate , 2019, COLT.

[3]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[4]  Ohad Shamir,et al.  How Good is SGD with Random Shuffling? , 2019, COLT 2019.

[5]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[6]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[7]  Dimitris Papailiopoulos,et al.  Closing the convergence gap of SGD without replacement , 2020, ICML.

[8]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[9]  Dan Garber,et al.  Online Convex Optimization in the Random Order Model , 2020, ICML.

[10]  Marten van Dijk,et al.  A Unified Convergence Analysis for Shuffling-Type Gradient Methods , 2020, J. Mach. Learn. Res..

[11]  Ohad Shamir,et al.  Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[12]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[13]  Luc Devroye,et al.  Distribution-free performance bounds with the resubstitution error estimate (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[14]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[15]  Ben London,et al.  A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent , 2017, NIPS.

[16]  Suvrit Sra,et al.  Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[17]  Prateek Jain,et al.  SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.

[18]  Raef Bassily,et al.  Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses , 2020, NeurIPS.

[19]  Suvrit Sra,et al.  SGD with shuffling: optimal rates without component convexity and large epoch requirements , 2020, NeurIPS.

[20]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[21]  W. Rogers,et al.  A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[22]  Luc Devroye,et al.  Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[23]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[24]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.