Optimal variance-reduced stochastic approximation in Banach spaces

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as average-reward policy evaluation. MSC 2020 classification: 62L20.

[1]  Martin J. Wainwright,et al.  Instance-Dependent Confidence and Early Stopping for Reinforcement Learning , 2022, ArXiv.

[2]  Martin J. Wainwright,et al.  Optimal and instance-dependent guarantees for Markovian linear stochastic approximation , 2021, COLT.

[3]  Guanghui Lan,et al.  Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning , 2020, SIAM J. Optim..

[4]  Martin J. Wainwright,et al.  ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm , 2020, COLT.

[5]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[6]  Guanghui Lan,et al.  Accelerated and instance-optimal policy evaluation with linear function approximation , 2021, SIAM Journal on Mathematics of Data Science.

[7]  Martin J. Wainwright,et al.  Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning , 2021, ArXiv.

[8]  Karthikeyan Shanmugam,et al.  Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators , 2021, NeurIPS.

[9]  Ee,et al.  Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis , 2021, Operations Research.

[10]  Siva Theja Maguluri,et al.  A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants , 2021, ArXiv.

[11]  Martin J. Wainwright,et al.  Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis , 2020, SIAM J. Math. Data Sci..

[12]  Lam M. Nguyen,et al.  Inexact SARAH algorithm for stochastic optimization , 2018, Optim. Methods Softw..

[13]  John C. Duchi,et al.  Asymptotic optimality in stochastic optimization , 2016, The Annals of Statistics.

[14]  A. Kirsch An Introduction to the Mathematical Theory of Inverse Problems , 1996, Applied Mathematical Sciences.

[15]  Siva Theja Maguluri,et al.  Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning , 2021, NeurIPS.

[16]  Vivek S. Borkar,et al.  A concentration bound for contractive stochastic approximation , 2021, Syst. Control. Lett..

[17]  Aaron Sidford,et al.  Efficiently Solving MDPs with Stochastic Mirror Descent , 2020, ICML.

[18]  Martin J. Wainwright,et al.  On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration , 2020, COLT.

[19]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[20]  Siva Theja Maguluri,et al.  Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes , 2020, ArXiv.

[21]  Adam Wierman,et al.  Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning , 2020, COLT.

[22]  Martin J. Wainwright,et al.  Variance-reduced Q-learning is minimax optimal , 2019, ArXiv.

[23]  Thinh T. Doan,et al.  Performance of Q-learning with Linear Function Approximation: Stability and Finite-Time Analysis , 2019 .

[24]  Francesco Orabona,et al.  Momentum-Based Variance Reduction in Non-Convex SGD , 2019, NeurIPS.

[25]  Lin F. Yang,et al.  Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.

[26]  Martin J. Wainwright,et al.  Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning , 2019, ArXiv.

[27]  Rahul Jain,et al.  Probabilistic Contraction Analysis of Iterated Random Operators , 2018, 1804.01195.

[28]  Michael I. Jordan,et al.  Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[29]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[30]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[31]  S. Gadat,et al.  Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity , 2017, 1709.03342.

[32]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[33]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[34]  Artin,et al.  SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[35]  Xin T. Tong,et al.  Statistical inference for model parameters in stochastic gradient descent , 2016, The Annals of Statistics.

[36]  Wolfgang Ziegler,et al.  Recursive Methods In Economic Dynamics , 2016 .

[37]  Bruno Scherrer,et al.  Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[38]  Sham M. Kakade,et al.  Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.

[39]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[40]  R. Handel Probability in High Dimension , 2014 .

[41]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[42]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[43]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[44]  T. Sideris Ordinary Differential Equations and Dynamical Systems , 2013 .

[45]  Dimitri P. Bertsekas,et al.  Q-learning and policy iteration algorithms for stochastic shortest path problems , 2012, Annals of Operations Research.

[46]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[47]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[48]  Dimitri P. Bertsekasy Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications , 2012 .

[49]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[50]  Dimitri P. Bertsekas,et al.  Approximate Dynamic Programming , 2017, Encyclopedia of Machine Learning and Data Mining.

[51]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[52]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[53]  H. Robbins A Stochastic Approximation Method , 1951 .

[54]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[55]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[56]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[57]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[58]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[59]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[60]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[61]  Stephen D. Patek,et al.  Stochastic and shortest path games: theory and algorithms , 1997 .

[62]  M. Talagrand New concentration inequalities in product spaces , 1996 .

[63]  M. Benaïm A Dynamical System Approach to Stochastic Approximations , 1996 .

[64]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[65]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[66]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[67]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[68]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[69]  P. Tseng Solving H-horizon, stationary Markov decision problems in time proportional to log(H) , 1990 .

[70]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[71]  W. Grassman Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory (Harold J. Kushner) , 1986 .

[72]  K. Deimling Fixed Point Theory , 2008 .

[73]  Harold J. Kushner,et al.  Approximation and Weak Convergence Methods for Random Processes , 1984 .

[74]  H. Kushner,et al.  An Invariant Measure Approach to the Convergence of Stochastic Approximations with State Dependent Noise. , 1984 .

[75]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[76]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[77]  Lennart Ljung,et al.  On positive real transfer functions and the convergence of some recursive schemes , 1977 .

[78]  J. Hájek Local asymptotic minimax and admissibility in estimation , 1972 .

[79]  C. Derman DENUMERABLE STATE MARKOVIAN DECISION PROCESSES: AVERAGE COST CRITERION. , 1966 .

[80]  R. Has’minskiĭ On Stochastic Processes Defined by Differential Equations with a Small Parameter , 1966 .

[81]  Le Cam,et al.  On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates , 1953 .

[82]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .