暂无分享,去创建一个
[1] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[2] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[3] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[4] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[5] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[6] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[7] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[8] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[9] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[10] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[11] AI Koan. Weighted Sums of Random Kitchen Sinks : Replacing minimization with randomization in learning , 2008 .
[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[13] Weifeng Liu,et al. Adaptive and Learning Systems for Signal Processing, Communication, and Control , 2010 .
[14] Yoshua Bengio,et al. Shallow vs. Deep Sum-Product Networks , 2011, NIPS.
[15] J. T. Spooner,et al. Adaptive and Learning Systems for Signal Processing , Communications , and Control , 2013 .
[16] D. Costarelli,et al. Constructive Approximation by Superposition of Sigmoidal Functions , 2013 .
[17] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[18] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[19] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.
[20] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[21] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[22] Alexander Cloninger,et al. Provable approximation properties for deep neural networks , 2015, ArXiv.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Zhanxing Zhu,et al. Neural Information Processing Systems (NIPS) , 2015 .
[25] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[30] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[31] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[32] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[33] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[34] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[37] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[38] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[39] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[40] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[41] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[42] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[43] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[44] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[45] Huan Wang,et al. Identifying Generalization Properties in Neural Networks , 2018, ArXiv.
[46] A. Jadbabaie,et al. Finite sample expressive power of small-width ReLU networks , 2018, ArXiv.
[47] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[48] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[49] David Rolnick,et al. The power of deeper networks for expressing natural functions , 2017, ICLR.
[50] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.
[51] Tomaso A. Poggio,et al. Theory IIIb: Generalization in Deep Networks , 2018, ArXiv.
[52] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[53] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[54] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[55] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[56] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[57] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[58] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[59] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[60] Babak Hassibi,et al. Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[61] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[62] Suvrit Sra,et al. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity , 2018, NeurIPS.
[63] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[64] Ryan P. Adams,et al. Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.
[65] John K. Tsotsos,et al. Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing , 2018, 2019 16th Conference on Computer and Robot Vision (CRV).
[66] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[67] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[68] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[69] Niladri S. Chatterji,et al. The intriguing role of module criticality in the generalization of deep networks , 2019, ICLR.
[70] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[71] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[72] A. Dosovitskiy,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[73] Ariel Kleiner,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ICLR.