暂无分享,去创建一个
[1] R. Bucy,et al. Stability and positive supermartingales , 1965 .
[2] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[3] Thibault Langlois,et al. Parameter adaptation in stochastic optimization , 1999 .
[4] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[5] Kenji Fukumizu,et al. Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.
[6] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[7] Yann LeCun,et al. Large Scale Online Learning , 2003, NIPS.
[8] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[9] H. Robbins. A Stochastic Approximation Method , 1951 .
[10] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[11] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[13] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[15] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[16] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.
[17] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[18] Wei Xu,et al. Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.
[19] O. Chapelle. Improved Preconditioner for Hessian Free Optimization , 2011 .
[20] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[21] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.
[22] Tom Schaul,et al. Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients , 2013, ICLR.