暂无分享,去创建一个
[1] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[4] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[5] Gang Niu,et al. Do We Need Zero Training Loss After Achieving Zero Training Error? , 2020, ICML.
[6] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[9] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[10] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[11] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[12] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[13] Austin Wang,et al. Learning State-Dependent Losses for Inverse Dynamics Learning , 2020, ArXiv.
[14] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[15] Lorien Y. Pratt,et al. Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.
[16] Ruslan Salakhutdinov,et al. Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.
[17] Risto Miikkulainen,et al. Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization , 2019, 2020 IEEE Congress on Evolutionary Computation (CEC).
[18] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[19] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[20] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[21] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[22] Risto Miikkulainen,et al. Evolving Loss Functions with Multivariate Taylor Polynomial Parameterizations , 2020, ArXiv.
[23] Hervé Bourlard,et al. Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.
[24] Stefano Soatto,et al. Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence , 2019, NeurIPS.
[25] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[26] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[27] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).
[28] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[30] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[31] Guy Blanc,et al. Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process , 2019, COLT.