An Overview of Some Issues in the Theory of Deep Networks
暂无分享,去创建一个
[1] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.
[2] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[3] Alexander Rakhlin,et al. Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon , 2018, COLT.
[4] Wei Hu,et al. Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.
[5] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[6] Tomaso Poggio,et al. Notes on Hierarchical Splines, DCLNs and i-theory , 2015 .
[7] Sayan Mukherjee,et al. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..
[8] Zizhong Chen,et al. Condition Numbers of Gaussian Random Matrices , 2005, SIAM J. Matrix Anal. Appl..
[9] Roi Livni,et al. A Provably Efficient Algorithm for Training Deep Networks , 2013, ArXiv.
[10] Noureddine El Karoui,et al. The spectrum of kernel random matrices , 2010, 1001.0492.
[11] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[12] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[13] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[14] Lorenzo Rosasco,et al. Deep Convolutional Networks are Hierarchical Kernel Machines , 2015, ArXiv.
[15] Dmitry Yarotsky,et al. Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.
[16] Qiang Liu,et al. On the Margin Theory of Feedforward Neural Networks , 2018, ArXiv.
[17] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[18] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[19] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.
[20] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[21] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[22] Nathan Srebro,et al. Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models , 2019, ICML.
[23] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[24] Xin Li,et al. Limitations of the approximation capabilities of neural networks with one hidden layer , 1996, Adv. Comput. Math..
[25] Ohad Shamir,et al. Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions , 2016, ArXiv.
[26] Tomaso Poggio,et al. Double descent in the condition number , 2019, ArXiv.
[27] A. Turing. ROUNDING-OFF ERRORS IN MATRIX PROCESSES , 1948 .
[28] Paulo Jorge S. G. Ferreira,et al. The existence and uniqueness of the minimum norm solution to certain linear and nonlinear problems , 1996, Signal Process..
[29] H. Mhaskar,et al. Neural networks for localized approximation , 1994 .
[30] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[31] G. Petrova,et al. Nonlinear Approximation and (Deep) ReLU\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {ReLU}$$\end{document} , 2019, Constructive Approximation.
[32] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.
[33] Hrushikesh Narhar Mhaskar,et al. Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..
[34] Tomaso Poggio,et al. I-theory on depth vs width: hierarchical function composition , 2015 .
[35] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[36] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[37] H. Mhaskar. Neural networks for localized approximation of real functions , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.
[38] André Elisseeff,et al. Algorithmic Stability and Generalization Performance , 2000, NIPS.
[39] Guillermo Sapiro,et al. On the Stability of Deep Networks , 2014, ICLR.
[40] Tomaso Poggio,et al. Complexity control by gradient descent in deep networks , 2020, Nature Communications.
[41] Tomaso A. Poggio,et al. Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..
[42] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.
[43] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[44] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[45] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[46] Ji Zhu,et al. Margin Maximizing Loss Functions , 2003, NIPS.
[47] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[48] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[49] Matus Telgarsky,et al. Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.
[50] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.