Theoretical issues in deep networks
暂无分享,去创建一个
[1] Christine McGourty. Dealing with the data , 1989, Nature.
[2] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[3] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[4] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[5] Tomaso Poggio,et al. Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2013, 1311.4158.
[6] Amit Daniely,et al. SGD Learns the Conjugate Kernel Class of the Network , 2017, NIPS.
[7] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[9] Matus Telgarsky,et al. Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.
[10] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[11] Tomaso Poggio,et al. Complexity control by gradient descent in deep networks , 2020, Nature Communications.
[12] H. N. Mhaskar,et al. Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.
[13] H. Mhaskar,et al. Neural networks for localized approximation , 1994 .
[14] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[15] Tomaso A. Poggio,et al. Learning Real and Boolean Functions: When Is Deep Better Than Shallow , 2016, ArXiv.
[16] Xin Li,et al. Limitations of the approximation capabilities of neural networks with one hidden layer , 1996, Adv. Comput. Math..
[17] G. Petrova,et al. Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document} , 2019, Constructive Approximation.
[18] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[19] Lorenzo Rosasco,et al. Deep Convolutional Networks are Hierarchical Kernel Machines , 2015, ArXiv.
[20] Robert C. Wolpert,et al. A Review of the , 1985 .
[21] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[22] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[23] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[24] Paulo Jorge S. G. Ferreira,et al. The existence and uniqueness of the minimum norm solution to certain linear and nonlinear problems , 1996, Signal Process..
[25] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[26] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.
[27] Tomaso Poggio,et al. Notes on Hierarchical Splines, DCLNs and i-theory , 2015 .
[28] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[29] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[30] Roi Livni,et al. A Provably Efficient Algorithm for Training Deep Networks , 2013, ArXiv.
[31] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[32] T. Poggio,et al. The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.
[33] Tengyuan Liang,et al. On the Risk of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, ArXiv.
[34] Lorenzo Rosasco,et al. Unsupervised learning of invariant representations , 2016, Theor. Comput. Sci..
[35] Tomaso Poggio,et al. I-theory on depth vs width: hierarchical function composition , 2015 .
[36] Federico Girosi,et al. On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.
[37] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[38] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.
[39] Tomaso A. Poggio,et al. A Surprising Linear Relationship Predicts Test Performance in Deep Networks , 2018, ArXiv.
[40] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[41] Ohad Shamir,et al. Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions , 2016, ArXiv.
[42] Hrushikesh Narhar Mhaskar,et al. Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..
[43] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[44] H. Mhaskar. Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.
[45] Alexander Rakhlin,et al. Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.
[46] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[47] Gregory Piatetsky-Shapiro,et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .
[48] Aleksej F. Filippov,et al. Differential Equations with Discontinuous Righthand Sides , 1988, Mathematics and Its Applications.