Theory of Deep Learning III : the non-overfitting puzzle
暂无分享,去创建一个
T. Poggio | H. Mhaskar | L. Rosasco | X. Boix | B. Miranda | Q. Liao | K. Kawaguchi | J. Hidary
[1] J. Czipszer,et al. Sur l'approximation d'une fonction périodique et de ses dérivées successives par un polynome trigono-métrique et par ses dérivées successives , 1958 .
[2] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[3] Hrushikesh Narhar Mhaskar,et al. Approximation properties of a multilayered feedforward artificial neural network , 1993, Adv. Comput. Math..
[4] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[5] B. Aulbach,et al. The Hartman-Grobman theorem for Carathéodory-type differential equations in Banach spaces , 2000 .
[6] 김희라. Waiting for Godot에 나타난 희망의 구조 , 2003 .
[7] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[8] Shie Mannor,et al. Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..
[9] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[10] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.
[11] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[12] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[13] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[14] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[15] Lorenzo Rosasco,et al. Optimal Rates for Multi-pass Stochastic Gradient Methods , 2016, J. Mach. Learn. Res..
[16] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[17] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[18] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[19] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[20] Noah Golowich,et al. Musings on Deep Learning: Properties of SGD , 2017 .
[21] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[22] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[23] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[24] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.