论文信息 - Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

[1] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.

[2] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[3] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[4] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[5] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.

[6] Joan Bruna,et al. Topology and Geometry of Deep Rectified Network Optimization Landscapes , 2016 .

[7] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.

[8] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[9] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.

[10] Sumio Watanabe,et al. Almost All Learning Machines are Singular , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.

[11] Yann LeCun,et al. Universal halting times in optimization and machine learning , 2015, 1511.06444.

[12] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.