Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and Q-learning for tabular Markov decision processes.

[1]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[2]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[3]  Martin J. Wainwright,et al.  Instance-Dependent Confidence and Early Stopping for Reinforcement Learning , 2022, ArXiv.

[4]  Vivek S. Borkar,et al.  A concentration bound for contractive stochastic approximation , 2021, Syst. Control. Lett..

[5]  M. Benaïm A Dynamical System Approach to Stochastic Approximations , 1996 .

[6]  Lennart Ljung,et al.  On positive real transfer functions and the convergence of some recursive schemes , 1977 .

[7]  Guanghui Lan,et al.  Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning , 2020, SIAM J. Optim..

[8]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[9]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[10]  Siva Theja Maguluri,et al.  Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes , 2020, ArXiv.

[11]  Changxiao Cai,et al.  Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis , 2021 .

[12]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[13]  Rahul Jain,et al.  Probabilistic Contraction Analysis of Iterated Random Operators , 2018, 1804.01195.

[14]  Martin J. Wainwright,et al.  Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..

[15]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[16]  Karthikeyan Shanmugam,et al.  A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants , 2021, ArXiv.

[17]  Thinh T. Doan,et al.  Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .

[18]  P. Tseng Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[19]  Siva Theja Maguluri,et al.  Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning , 2021, NeurIPS.

[20]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[21]  Wolfgang Ziegler,et al.  Recursive Methods In Economic Dynamics , 2016 .

[22]  H. Robbins A Stochastic Approximation Method , 1951 .

[23]  T. Sideris Ordinary Differential Equations and Dynamical Systems , 2013 .

[24]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[25]  Le Cam,et al.  On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates , 1953 .

[26]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[27]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[28]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[29]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[30]  R. Has’minskiĭ On Stochastic Processes Defined by Differential Equations with a Small Parameter , 1966 .

[31]  A. Kirsch An Introduction to the Mathematical Theory of Inverse Problems , 1996, Applied Mathematical Sciences.

[32]  R. Handel Probability in High Dimension , 2014 .

[33]  Guanghui Lan,et al.  Accelerated and instance-optimal policy evaluation with linear function approximation , 2021, ArXiv.

[34]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[35]  Adam Wierman,et al.  Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.

[36]  J. Hájek Local asymptotic minimax and admissibility in estimation , 1972 .

[37]  Sham M. Kakade,et al.  Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.

[38]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[39]  Karthikeyan Shanmugam,et al.  Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators , 2021, NeurIPS.

[40]  Xin T. Tong,et al.  Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.

[41]  Stephen D. Patek,et al.  Stochastic and shortest path games: theory and algorithms , 1997 .

[42]  John C. Duchi,et al.  Asymptotic optimality in stochastic optimization , 2016, The Annals of Statistics.

[43]  Martin J. Wainwright,et al.  Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.

[44]  Aaron Sidford,et al.  Efficiently Solving MDPs with Stochastic Mirror Descent , 2020, ICML.

[45]  H. Kushner,et al.  An Invariant Measure Approach to the Convergence of Stochastic Approximations with State Dependent Noise. , 1984 .

[46]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[47]  Lin F. Yang,et al.  Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.

[48]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[49]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[50]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[51]  Martin J. Wainwright,et al.  Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.

[52]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[53]  Michael I. Jordan,et al.  Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[54]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[55]  Martin J. Wainwright,et al.  On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration , 2020, COLT.

[56]  Lam M. Nguyen,et al.  Inexact SARAH algorithm for stochastic optimization , 2018, Optim. Methods Softw..

[57]  Harold J. Kushner,et al.  Approximation and Weak Convergence Methods for Random Processes , 1984 .

[58]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[59]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[60]  W. Grassman Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory (Harold J. Kushner) , 1986 .

[61]  Martin J. Wainwright,et al.  Optimal and instance-dependent guarantees for Markovian linear stochastic approximation , 2021, COLT.

[62]  A. Kirsch An Introduction to the Mathematical Theory of Inverse Problems , 2021, Applied Mathematical Sciences.

[63]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[64]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[65]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[66]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[67]  Dimitri P. Bertsekasy Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications , 2012 .

[68]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[69]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[70]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[71]  S. Gadat,et al.  Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity , 2017, 1709.03342.

[72]  Martin J. Wainwright,et al.  ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm , 2020, COLT.

[73]  Bruno Scherrer,et al.  Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[74]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[75]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[76]  C. Derman DENUMERABLE STATE MARKOVIAN DECISION PROCESSES: AVERAGE COST CRITERION. , 1966 .

[77]  K. Deimling Fixed Point Theory , 2008 .

[78]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[79]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[80]  Dimitri P. Bertsekas,et al.  Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.