Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
暂无分享,去创建一个
Surya Ganguli | Razvan Pascanu | Yoshua Bengio | Yann Dauphin | Kyunghyun Cho | Çaglar Gülçehre | Yoshua Bengio | Kyunghyun Cho | Çaglar Gülçehre | Razvan Pascanu | Yann Dauphin | S. Ganguli | Y. Dauphin
[1] E. Wigner. On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .
[2] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[3] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[4] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[5] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[6] Magnus Rattray,et al. Natural gradient descent for on-line learning , 1998 .
[7] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[8] Hyeyoung Park,et al. On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units : Steepest Gradient Descent and Natural Gradient Descent , 2002, cond-mat/0212006.
[9] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .
[10] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.
[11] A. Bray,et al. Statistics of critical points of Gaussian fields on large-dimensional spaces. , 2006, Physical review letters.
[12] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[13] Eiji Mizutani,et al. An analysis on negative curvature induced by singularity in multi-layer neural-network learning , 2010, NIPS.
[14] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[15] J. Callahan. Advanced Calculus: A Geometric View , 2010 .
[16] W. Murray. Newton‐Type Methods , 2011 .
[17] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[18] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[19] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[20] James L. McClelland,et al. Learning hierarchical category structure in deep neural networks , 2013 .
[21] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[22] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[23] Surya Ganguli,et al. On the saddle point problem for non-convex optimization , 2014, ArXiv.
[24] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[25] Surya Ganguli,et al. Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods , 2013, ICML.
[26] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.