3 A semi-rigorous theory of the optimization landscape of Deep Nets : Bezout theorem and Boltzman distribution
暂无分享,去创建一个
[1] Phan-Minh Nguyen,et al. Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks , 2019, ArXiv.
[2] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[3] Daniel Kunin,et al. Loss Landscapes of Regularized Linear Autoencoders , 2019, ICML.
[4] Alexander Rakhlin,et al. Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.
[5] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[6] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[7] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[8] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[9] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[10] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[11] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[12] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[13] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[14] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[15] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[16] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[17] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[18] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[19] Lorenzo Rosasco,et al. Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.
[20] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[21] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[22] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[23] Inderjit S. Dhillon,et al. Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels , 2017, ArXiv.
[24] Guillermo Sapiro,et al. Robust Large Margin Deep Neural Networks , 2017, IEEE Transactions on Signal Processing.
[25] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[26] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[27] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[28] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[29] Noah Golowich,et al. Musings on Deep Learning: Properties of SGD , 2017 .
[30] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[31] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[32] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[33] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[34] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[35] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[36] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[37] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[38] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[39] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[40] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[41] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[42] Shie Mannor,et al. Robustness and generalization , 2010, Machine Learning.
[43] Gábor Lugosi,et al. Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.
[44] Ji Zhu,et al. Margin Maximizing Loss Functions , 2003, NIPS.
[45] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2006 .
[46] Sun-Yuan Kung,et al. On gradient adaptation with unit-norm constraints , 2000, IEEE Trans. Signal Process..
[47] Paulo Jorge S. G. Ferreira,et al. The existence and uniqueness of the minimum norm solution to certain linear and nonlinear problems , 1996, Signal Process..
[48] B. Halpern. Fixed points of nonexpanding maps , 1967 .