Small nonlinearities in activation functions create bad local minima in neural networks
暂无分享,去创建一个
[1] Yi Zhou,et al. Critical Points of Linear Neural Networks: Analytical Forms and Landscape Properties , 2017, ICLR.
[2] Yi Zhou,et al. SGD Converges to Global Minimum in Deep Learning via Star-convex Path , 2019, ICLR.
[3] Thomas Laurent,et al. Deep linear neural networks with arbitrary loss: All local minima are global , 2017, ArXiv.
[4] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[5] David Tse,et al. Porcupine Neural Networks: (Almost) All Local Optima are Global , 2017, ArXiv.
[6] Suvrit Sra,et al. Global optimality conditions for deep neural networks , 2017, ICLR.
[7] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.
[8] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[9] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[10] Razvan Pascanu,et al. Local minima in training of neural networks , 2016, 1611.06310.
[11] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[12] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[13] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[14] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[15] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[16] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[17] Joan Bruna,et al. Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys , 2018, ArXiv.
[18] Yi Zhou,et al. Critical Points of Neural Networks: Analytical Forms and Landscape Properties , 2017, ArXiv.
[19] Harold R. Parks,et al. A Primer of Real Analytic Functions , 1992 .
[20] Haihao Lu,et al. Depth Creates No Bad Local Minima , 2017, ArXiv.
[21] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[22] R. Srikant,et al. Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.
[23] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[24] Jason D. Lee,et al. No Spurious Local Minima in a Two Hidden Unit ReLU Network , 2018, ICLR.
[25] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[26] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[27] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[29] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[30] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[31] Ohad Shamir,et al. Are ResNets Provably Better than Linear Predictors? , 2018, NeurIPS.
[32] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[33] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[34] X H Yu,et al. On the local minima free condition of backpropagation learning , 1995, IEEE Trans. Neural Networks.
[35] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[36] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[37] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[38] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[39] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[40] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[41] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[42] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[43] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[44] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.