Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problem-dependent quantities: they cannot be improved by more than constant factors.

[1]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[4]  Bin Yu Assouad, Fano, and Le Cam , 1997 .

[5]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[7]  Michael I. Jordan Graphical Models , 1998 .

[8]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[9]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[10]  V. Buldygin,et al.  Metric characterization of random variables and random processes , 2000 .

[11]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[12]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[13]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[14]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[15]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[16]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[17]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[18]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[19]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[20]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[21]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[22]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[23]  Martin J. Wainwright,et al.  Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[24]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[25]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[26]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[27]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[28]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[29]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[30]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.