Universal statistics of Fisher information in deep neural networks: mean field approach
暂无分享,去创建一个
[1] Sommers,et al. Chaos in random neural networks. , 1988, Physical review letters.
[2] Saad,et al. Exact solution for on-line learning in multilayer neural networks. , 1995, Physical review letters.
[3] Kenji Fukumizu,et al. A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network , 1996, Neural Networks.
[4] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[5] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[6] Léon Bottou,et al. On-line learning and stochastic approximations , 1999 .
[7] Kenji Fukumizu,et al. Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.
[8] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[9] Shun-ichi Amari,et al. A method of statistical neurodynamics , 1974, Kybernetik.
[10] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.
[11] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[12] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[13] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[14] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[15] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[16] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[17] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[18] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[19] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[20] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[21] Shun-ichi Amari,et al. Information Geometry and Its Applications , 2016 .
[22] Ohad Shamir,et al. On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.
[23] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[24] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[25] S. Amari. Natural Gradient Learning and Its Dynamics in Singular Regions , 2016 .
[26] Jonathan Kadmon,et al. Optimal Architectures in a Solvable Model of Deep Networks , 2016, NIPS.
[27] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[28] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[29] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[30] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[31] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[32] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[33] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[34] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[35] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[36] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[37] Quoc V. Le,et al. Understanding Generalization and Stochastic Gradient Descent , 2017 .
[38] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[39] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[40] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[41] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[42] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[43] A. Montanari,et al. The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.
[44] Bo Li,et al. Exploring the Function Space of Deep-Learning Machines , 2017, Physical review letters.
[45] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[46] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[47] Samuel S. Schoenholz,et al. Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.
[48] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[49] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[50] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.