On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
暂无分享,去创建一个
Michael I. Jordan | Sham M. Kakade | Praneeth Netrapalli | Rong Ge | Chi Jin | S. Kakade | Chi Jin | Praneeth Netrapalli | Rong Ge
[1] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[2] Quanquan Gu,et al. Finding Local Minima via Stochastic Nested Variance Reduction , 2018, ArXiv.
[3] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[4] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.
[5] Daniel P. Robinson,et al. A trust region algorithm with a worst-case iteration complexity of $$\mathcal{O}(\epsilon ^{-3/2})$$O(ϵ-3/2) for nonconvex optimization , 2017, Math. Program..
[6] Michael I. Jordan,et al. CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..
[7] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[8] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[9] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.
[10] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.
[11] Quanquan Gu,et al. Stochastic Recursive Variance-Reduced Cubic Regularization Methods , 2019, AISTATS.
[12] Nicolas Boumal,et al. On the low-rank approach for semidefinite programs arising in synchronization and community detection , 2016, COLT.
[13] Michael I. Jordan,et al. Stochastic Cubic Regularization for Fast Nonconvex Optimization , 2017, NeurIPS.
[14] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.
[15] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[16] A. Bovier,et al. Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .
[17] Zhouchen Lin,et al. Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.
[18] Tianbao Yang,et al. First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time , 2017, NeurIPS.
[19] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[20] Alexander J. Smola,et al. A Generic Approach for Escaping Saddle points , 2017, AISTATS.
[21] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[22] Andrea Montanari,et al. Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality , 2017, COLT.
[23] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[24] H. Robbins. A Stochastic Approximation Method , 1951 .
[25] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[26] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[27] Yurii Nesterov,et al. Squared Functional Systems and Optimization Problems , 2000 .
[28] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .
[29] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[30] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[31] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[32] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[33] Zeyuan Allen-Zhu,et al. Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.
[34] Yair Carmon,et al. Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.
[35] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[36] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[37] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[38] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[39] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[40] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[41] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[42] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[43] Yuanzhi Li,et al. Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.
[44] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[45] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[46] Nicolas Boumal,et al. The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.
[47] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[48] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.