Randomized Smoothing for Stochastic Optimization

We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates of stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based rates for non-smooth optimization. We give several applications of our results to statistical estimation problems, and provide experimental results that demonstrate the effectiveness of the proposed algorithms. We also describe how a combination of our algorithm with recent work on decentralized optimization yields a distributed stochastic optimization algorithm that is order-optimal.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[3]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[4]  D. Bertsekas Stochastic optimization problems with nondifferentiable cost functionals , 1973 .

[5]  Yakov Z. Tsypkin,et al.  Robust identification , 1980, Autom..

[6]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[7]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[8]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[9]  M. Teboulle,et al.  A smoothing technique for nondifferentiable optimization problems , 1988 .

[10]  G. Rappl On Linear Convergence of a Class of Random Search Algorithms , 1989 .

[11]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[12]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[13]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[14]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[15]  Olvi L. Mangasarian,et al.  A class of smoothing functions for nonlinear and mixed complementarity problems , 1996, Comput. Optim. Appl..

[16]  Claude Lemaréchal,et al.  Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[17]  R. Tyrrell Rockafellar,et al.  Variational Analysis , 1998, Grundlehren der mathematischen Wissenschaften.

[18]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[19]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[20]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[21]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[22]  B. Anderson,et al.  ROBUST IDENTIFICATION OF , 2005 .

[23]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[24]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[25]  H. Robbins A Stochastic Approximation Method , 1951 .

[26]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[27]  Daniela Pucci de Farias,et al.  Decentralized Resource Allocation in Dynamic Networks of Agents , 2008, SIAM J. Optim..

[28]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[29]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[30]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[31]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[32]  Guanghui Lan Convex optimization under inexact first-order information , 2009 .

[33]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[34]  A. Nedić,et al.  Convex nondifferentiable stochastic optimization: A local randomized smoothing technique , 2010, Proceedings of the 2010 American Control Conference.

[35]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[36]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[37]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[38]  Angelia Nedic,et al.  On stochastic gradient and subgradient methods with adaptive steplength sequences , 2011, Autom..

[39]  Alexander Shapiro,et al.  Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[40]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.