Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
暂无分享,去创建一个
[1] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.
[2] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[3] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[4] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[5] Joan Bruna,et al. Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.
[6] Joan Bruna,et al. Topology and Geometry of Deep Rectified Network Optimization Landscapes , 2016 .
[7] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[8] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[9] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[10] Sumio Watanabe,et al. Almost All Learning Machines are Singular , 2007, 2007 IEEE Symposium on Foundations of Computational Intelligence.
[11] Yann LeCun,et al. Universal halting times in optimization and machine learning , 2015, 1511.06444.
[12] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.