Deep learning with Elastic Averaging SGD
暂无分享,去创建一个
[1] R. J. Paul,et al. Optimization Theory: The Finite Dimensional Case , 1977 .
[2] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[3] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[4] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[5] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[6] Vivek S. Borkar,et al. Distributed Asynchronous Incremental Subgradient Methods , 2001 .
[7] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[8] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[9] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..
[10] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[11] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[12] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[13] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[14] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[15] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[16] Guanghui Lan,et al. An optimal method for stochastic composite optimization , 2011, Mathematical Programming.
[17] Asuman E. Ozdaglar,et al. Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[18] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[21] Alexander G. Gray,et al. Stochastic Alternating Direction Method of Multipliers , 2013, ICML.
[22] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[23] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[24] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[25] Marc'Aurelio Ranzato,et al. Multi-GPU Training of ConvNets , 2013, ICLR.
[26] R. Fergus,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[27] James T. Kwok,et al. Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.
[28] Thomas Paine,et al. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training , 2013, ICLR.
[29] Suvrit Sra,et al. Towards an optimal stochastic alternating direction method of multipliers , 2014, ICML.
[30] Ohad Shamir,et al. Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.
[31] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[32] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .