Memo No . 072 December 27 , 2017 Theory of Deep Learning IIb : Optimization Properties of SGD by
暂无分享,去创建一个
[1] Noah Golowich,et al. Musings on Deep Learning: Properties of SGD , 2017 .
[2] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[3] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[4] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[5] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[6] Neil Genzlinger. A. and Q , 2006 .
[7] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[8] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[9] S. Mitter,et al. Recursive stochastic algorithms for global optimization in R d , 1991 .
[10] B. Gidas. Global optimization via the Langevin equation , 1985, 1985 24th IEEE Conference on Decision and Control.