暂无分享,去创建一个
[1] L. Rosasco,et al. Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.
[2] Kfir Y. Levy,et al. Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.
[3] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[4] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[5] Andreas Veit,et al. Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.
[6] Mingrui Liu,et al. Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets , 2019, ICLR.
[7] Li Shen,et al. A Sufficient Condition for Convergences of Adam and RMSProp , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[11] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[12] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[13] Suvrit Sra,et al. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.
[14] Ohad Shamir,et al. The Complexity of Finding Stationary Points with Stochastic Gradient Descent , 2020, ICML.
[15] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[16] S. Gadat,et al. Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity , 2017, 1709.03342.
[17] Bin Dong,et al. Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate , 2019, IJCAI.
[18] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[19] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[20] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[21] Alejandro Jofré,et al. On variance reduction for stochastic smooth convex optimization with multiplicative noise , 2017, Math. Program..
[22] Yi Zhang,et al. The Case for Full-Matrix Adaptive Regularization , 2018, ArXiv.
[23] Li Shen,et al. On the Convergence of Weighted AdaGrad with Momentum for Training Deep Neural Networks , 2018 .
[24] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[25] Denis Yarats,et al. On the adequacy of untuned warmup for adaptive optimization , 2019, AAAI.
[26] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[27] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[28] Li Shen,et al. Weighted AdaGrad with Unified Momentum , 2018 .
[29] Yong Yu,et al. AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods , 2018, ICLR.
[30] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[31] Sanjiv Kumar,et al. Escaping Saddle Points with Adaptive Gradient Methods , 2019, ICML.
[32] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[33] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[34] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[35] Volkan Cevher,et al. Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.
[36] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.
[37] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[38] Mingyi Hong,et al. On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.
[39] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.
[40] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.
[41] Saeed Ghadimi,et al. Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..