Improved Sample Complexity for Stochastic Compositional Variance Reduced Gradient

Convex composition optimization is an emerging topic that covers a wide range of applications arising from stochastic optimal control, reinforcement learning and multistage stochastic programming. Existing algorithms suffer from unsatisfactory sample complexity and practical issues since they ignore the convexity structure in the algorithmic design. In this paper, we develop a new stochastic compositional variance-reduced gradient algorithm with the sample complexity of O((m + n)log(1/ε) + 1/ε3) where m + n is the total number of samples. Our algorithm is near-optimal as the dependence on m + n is optimal up to a logarithmic factor. Experimental results on real-world datasets demonstrate the effectiveness and efficiency of the new algorithm.

[1]  Jorge Nocedal,et al.  An investigation of Newton-Sketch and subsampled Newton methods , 2017, Optim. Methods Softw..

[2]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[3]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[4]  Zeyuan Allen-Zhu,et al.  Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter , 2017, ICML.

[5]  Mengdi Wang,et al.  Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions , 2014, Mathematical Programming.

[6]  Jiaqiao Hu,et al.  Model-Based Stochastic Search Methods , 2015 .

[7]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[9]  Mengdi Wang,et al.  Accelerating Stochastic Composition Optimization , 2016, NIPS.

[10]  Babu Narayanan,et al.  POWER SYSTEM STABILITY AND CONTROL , 2015 .

[11]  Ohad Shamir,et al.  Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[14]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[15]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[16]  Shai Shalev-Shwartz,et al.  SDCA without Duality , 2015, ArXiv.

[17]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[18]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[19]  Ethan X. Fang,et al.  Multi-Level Stochastic Gradient Methods for Nested Compositon Optimization , 2018 .

[20]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[21]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[22]  Heng Huang,et al.  Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization , 2017, AAAI.

[23]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[24]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[25]  Mengdi Wang,et al.  Finite-sum Composition Optimization via Variance Reduced Gradient Descent , 2016, AISTATS.

[26]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[27]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[28]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[29]  Yue Yu,et al.  Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization , 2017, IJCAI.

[30]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[33]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[34]  Shai Shalev-Shwartz,et al.  SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[35]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.