暂无分享,去创建一个
Hossein Mobahi | Peter L. Bartlett | Mehrdad Farajtabar | P. Bartlett | Mehrdad Farajtabar | H. Mobahi
[1] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[2] Hassan Ghasemzadeh,et al. Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher , 2019, ArXiv.
[3] Jonathan Ragan-Kelley,et al. Neural Kernels Without Tangents , 2020, ICML.
[4] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[5] Christoph H. Lampert,et al. Towards Understanding Knowledge Distillation , 2019, ICML.
[6] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.
[7] Vivek Rathod,et al. Bayesian dark knowledge , 2015, NIPS.
[8] Masashi Sugiyama,et al. Bayesian Dark Knowledge , 2015 .
[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[10] Jaehoon Lee,et al. Finite Versus Infinite Neural Networks: an Empirical Study , 2020, NeurIPS.
[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[12] Bernhard Schölkopf,et al. The connection between regularization operators and support vector kernels , 1998, Neural Networks.
[13] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[14] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[15] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[16] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[17] Samuel B. Williams,et al. Association for Computing Machinery , 2009 .
[18] Naiyan Wang,et al. Like What You Like: Knowledge Distill via Neuron Selectivity Transfer , 2017, ArXiv.
[19] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[20] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[21] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.
[22] François Fleuret,et al. Knowledge Transfer with Jacobian Matching , 2018, ICML.
[23] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[24] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[25] Benjamin M. Marlin,et al. Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks , 2020, UAI.
[26] Yan Lu,et al. Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] M. Wyart,et al. Disentangling feature and lazy training in deep neural networks , 2019 .
[28] Ankit Singh Rawat,et al. Why distillation helps: a statistical perspective , 2020, ArXiv.
[29] Rui Zhang,et al. KDGAN: Knowledge Distillation with Generative Adversarial Networks , 2018, NeurIPS.
[30] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[31] Willem Zuidema,et al. Transferring Inductive Biases through Knowledge Distillation , 2020, ArXiv.
[32] Rich Caruana,et al. Model compression , 2006, KDD '06.
[33] Alan L. Yuille,et al. Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students , 2018, AAAI.
[34] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.
[35] Zhang-Wei Hong,et al. Collaborative Inter-agent Knowledge Distillation for Reinforcement Learning , 2019 .
[36] Sergey Levine,et al. Divide-and-Conquer Reinforcement Learning , 2017, ICLR.
[37] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Neil D. Lawrence,et al. Variational Information Distillation for Knowledge Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Bin Dong,et al. Distillation ≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network , 2019, ArXiv.
[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] R. Venkatesh Babu,et al. Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.
[42] H. Eom. Green’s Functions: Applications , 2004 .
[43] Sebastian Nowozin,et al. Hydra: Preserving Ensemble Diversity for Model Distillation , 2020, ArXiv.
[44] Peyman Milanfar,et al. A Tour of Modern Image Filtering: New Insights and Methods, Both Practical and Theoretical , 2013, IEEE Signal Processing Magazine.
[45] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.