Random Shuffling Beats SGD after Finite Epochs
暂无分享,去创建一个
[1] Dimitri P. Bertsekas,et al. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.
[2] Tom Goldstein,et al. Efficient Distributed SGD with Variance Reduction , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[3] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[4] F. Krahmer,et al. An arithmetic–geometric mean inequality for products of three matrices , 2014, 1411.0333.
[5] L. Bottou. Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .
[6] Ohad Shamir,et al. Dimension-Free Iteration Complexity of Finite Sum Optimization Problems , 2016, NIPS.
[7] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[8] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[9] Paul Tseng,et al. An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..
[10] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.
[11] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[12] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[13] Ali H. Sayed,et al. Stochastic Learning under Random Reshuffling , 2018, ArXiv.
[14] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[15] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[16] Ruoyu Sun,et al. Worst-case complexity of cyclic coordinate descent: $$O(n^2)$$ O ( n 2 ) , 2016, Mathematical Programming.
[17] Mikhail V. Solodov,et al. Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..
[18] Ali H. Sayed,et al. Stochastic Learning Under Random Reshuffling With Constant Step-Sizes , 2018, IEEE Transactions on Signal Processing.
[19] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[20] Stephen J. Wright,et al. Optimization for Machine Learning , 2013 .
[21] A. Ozdaglar,et al. Convergence Rate of Incremental Gradient and Newton Methods , 2015 .
[22] Asuman E. Ozdaglar,et al. When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent , 2017, NIPS.
[23] Teuvo Kohonen,et al. An Adaptive Associative Memory Principle , 1974, IEEE Transactions on Computers.
[24] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[25] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[26] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[27] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[28] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.
[29] D. Bertsekas,et al. Convergen e Rate of In remental Subgradient Algorithms , 2000 .
[30] Christopher Ré,et al. Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.
[31] Prateek Jain,et al. SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.
[32] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[33] Stephen J. Wright,et al. Analyzing random permutations for cyclic coordinate descent , 2020, Math. Comput..
[34] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[35] Stephen J. Wright,et al. Random permutations fix a worst case for cyclic coordinate descent , 2016, IMA Journal of Numerical Analysis.
[36] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[37] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[38] Justin Domke,et al. Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.
[39] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[40] Tianbao Yang,et al. Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity , 2015 .
[41] Mark W. Schmidt,et al. Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition , 2013, 1308.6370.
[42] Tengzhou Zhang. A note on the non-commutative arithmetic-geometric mean inequality , 2014, 1411.5058.
[43] Elad Hazan,et al. An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.
[44] B. Recht,et al. Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences , 2012, 1202.4184.