Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods
暂无分享,去创建一个
Surya Ganguli | Jascha Sohl-Dickstein | Ben Poole | Ben Poole | J. Sohl-Dickstein | S. Ganguli | Jascha Narain Sohl-Dickstein
[1] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .
[2] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..
[3] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .
[4] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .
[5] D. Goldfarb. A family of variable-metric methods derived by variational means , 1970 .
[6] J. J. Moré,et al. Quasi-Newton Methods, Motivation and Theory , 1974 .
[7] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[8] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[9] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.
[10] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[11] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[12] B. Ripley,et al. Robust Statistics , 2018, Wiley Series in Probability and Statistics.
[13] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[14] Alfred O. Hero,et al. A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..
[15] H. Robbins. A Stochastic Approximation Method , 1951 .
[16] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[17] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..
[18] S. V. N. Vishwanathan,et al. Variable Metric Stochastic Approximation Theory , 2009, AISTATS.
[19] Joanna M. Papakonstantinou. Historical development of the BFGS secant method and its characterization properties , 2009 .
[20] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[21] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[22] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[23] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.
[24] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[25] Pascal Vincent,et al. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.
[26] Jascha Sohl-Dickstein,et al. Minimum Probability Flow Learning , 2009, ICML.
[27] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[28] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[29] Jascha Sohl-Dickstein,et al. A new method for parameter estimation in probabilistic models: Minimum probability flow , 2011, Physical review letters.
[30] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[31] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[32] Jascha Sohl-Dickstein,et al. The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use , 2012, ArXiv.
[33] Jascha Sohl-Dickstein,et al. Efficient and optimal binary Hopfield associative memory storage using minimum probability flow , 2012, 1204.2916.
[34] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[35] Philipp Hennig,et al. Fast Probabilistic Optimization from Noisy Gradients , 2013, ICML.
[36] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[37] Julien Mairal,et al. Optimization with First-Order Surrogate Functions , 2013, ICML.
[38] Ian J. Goodfellow,et al. Pylearn2: a machine learning research library , 2013, ArXiv.
[39] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[40] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[41] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..
[42] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..