Pathological Spectra of the Fisher Information Metric and Its Variants in Deep Neural Networks
暂无分享,去创建一个
[1] Yann LeCun,et al. Effiicient BackProp , 1996, Neural Networks: Tricks of the Trade.
[2] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[3] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[4] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[5] Ke Sun,et al. Lightlike Neuromanifolds, Occam's Razor and Deep Learning , 2019, ArXiv.
[6] Z. Fan,et al. Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks , 2020, NeurIPS.
[7] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[8] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[9] Greg Yang,et al. A Fine-Grained Spectral Perspective on Neural Networks , 2019, ArXiv.
[10] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[11] Shun-ichi Amari,et al. Information Geometry and Its Applications , 2016 .
[12] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[13] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[14] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[15] Taiji Suzuki,et al. Fast generalization error bound of deep learning from a kernel perspective , 2018, AISTATS.
[16] Shun-ichi Amari,et al. The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks , 2019, NeurIPS.
[17] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[18] Kenji Fukumizu,et al. A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network , 1996, Neural Networks.
[19] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[20] Kanter,et al. Eigenvalues of covariance matrices: Application to neural-network learning. , 1991, Physical review letters.
[21] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[22] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[23] Vardan Papyan,et al. Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians , 2019, ICML.
[24] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[25] Arthur Jacot,et al. Freeze and Chaos for DNNs: an NTK view of Batch Normalization, Checkerboard and Boundary Effects , 2019, ArXiv.
[26] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[27] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[28] Ryo Karakida,et al. Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks , 2020, NeurIPS.
[29] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[30] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[31] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[32] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[33] Jonathan Kadmon,et al. Optimal Architectures in a Solvable Model of Deep Networks , 2016, NIPS.
[34] Frederik Kunstner,et al. Limitations of the Empirical Fisher Approximation , 2019, NeurIPS.
[35] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[36] Shun-ichi Amari,et al. Universal statistics of Fisher information in deep neural networks: mean field approach , 2018, AISTATS.
[37] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[38] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[39] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[40] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[41] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[42] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[43] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[44] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[45] Shun-ichi Amari,et al. Fisher Information and Natural Gradient Learning of Random Deep Networks , 2018, AISTATS.
[46] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.
[47] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[48] Greg Yang,et al. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation , 2019, ArXiv.
[49] Shun-ichi Amari,et al. A method of statistical neurodynamics , 1974, Kybernetik.
[50] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.