Randomized smoothing for (parallel) stochastic optimization

By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates for stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based rates for non-smooth optimization. A combination of our techniques with recent work on decentralized optimization yields order-optimal parallel stochastic optimization algorithms. We give applications of our results to several statistical machine learning problems, providing experimental results (in the full version of the paper) demonstrating the effectiveness of our algorithms.

[1]  D. Bertsekas Stochastic optimization problems with nondifferentiable cost functionals , 1973 .

[2]  Yakov Z. Tsypkin,et al.  Robust identification , 1980, Autom..

[3]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[4]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Dan Klein,et al.  Parsing and Hypergraphs , 2001, IWPT.

[7]  Ravindra K. Ahuja,et al.  Inverse Optimization , 2001, Oper. Res..

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[10]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[13]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[14]  Clemens Heuberger,et al.  Inverse Combinatorial Optimization: A Survey on Problems, Methods, and Results , 2004, J. Comb. Optim..

[15]  B. Anderson,et al.  ROBUST IDENTIFICATION OF , 2005 .

[16]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[17]  Dirk P. Kroese,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[18]  Y. Singer,et al.  Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[19]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[20]  Pieter Abbeel,et al.  Apprenticeship learning and reinforcement learning with application to robotic control , 2008 .

[21]  Daniela Pucci de Farias,et al.  Decentralized Resource Allocation in Dynamic Networks of Agents , 2008, SIAM J. Optim..

[22]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[23]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[24]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[25]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[26]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[27]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[28]  A. Nedić,et al.  Convex nondifferentiable stochastic optimization: A local randomized smoothing technique , 2010, Proceedings of the 2010 American Control Conference.

[29]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[30]  Ohad Shamir,et al.  Optimal Distributed Online Prediction , 2011, ICML.

[31]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2012, Math. Program..

[32]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[33]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[34]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[35]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.