Distribution Estimation for Stochastic Approximation in Finite Samples Using A Surrogate Stochastic Differential Equation Method

Evaluating the statistical error in the estimate coming from a stochastic approximation (SA) algorithm is useful for confidence region calculation and the determination of stopping times. Robbins-Monro (RM) type stochastic gradient descent is a widely used method in SA. Knowledge of the probability distribution of the SA process is useful for error analysis. Currently, however, only the asymptotic distribution has been studied in this setting in asymptotic theories, while distribution functions in the finite-sample regime have not been clearly depicted. We developed a method to estimate the finite sample distribution based on a surrogate process. We described the stochastic gradient descent (SGD) process as a Euler-Maruyama (EM) scheme for some RM types of stochastic differential equations (SDEs). Weak convergence theory for EM schemes validates its surrogate property with a convergence in distribution sense. For the first time, we have shown that utilizing the solution of Fokker-Planck (FP) equation for the surrogate SDE is appropriate to characterize the evolution of the distribution function in SGD process.

[1]  J. Spall Uncertainty bounds for parameter identification with small sample sizes , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[2]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[3]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[4]  Tamio Shimizu,et al.  A Stochastic Approximation Method for Optimization Problems , 1969, Journal of the ACM.

[5]  Mikhail Borisovich Nevelʹson,et al.  Stochastic Approximation and Recursive Estimation , 1976 .

[6]  W. Marsden I and J , 2012 .

[7]  R. Bass Convergence of probability measures , 2011 .

[8]  ENOIT,et al.  Weak Convergence in the Prokhorov Metric of Methods for Stochastic Differential Equations , 2008 .

[9]  W. Ziemba,et al.  Stochastic optimization models in finance , 2006 .

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[11]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[12]  P. Kloeden,et al.  Numerical Solution of Stochastic Differential Equations , 1992 .

[13]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[14]  N. Lazrieva,et al.  The robbins-monro type stochastic differential equations. I. convergence of solutions , 1997 .

[15]  James C. Spall,et al.  Stopping times and confidence bounds for small-sample stochastic approximation algorithms , 2009 .

[16]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[17]  James C. Spall,et al.  Stopping small-sample stochastic approximation , 2009, 2009 American Control Conference.

[18]  M. Yor,et al.  Stochastic Differential Equations , 1991 .

[19]  K. Morton Numerical Solution of Convection-Diffusion Problems , 2019 .

[20]  Vanessa Hertzog,et al.  Computational Fluid Mechanics And Heat Transfer , 2016 .

[21]  N. Lazrieva,et al.  The Robbins-Monro type stochastic differential equations. III. Polyak's averaging , 2010 .

[22]  J. McCauley Stochastic Calculus and Differential Equations for Physics and Finance , 2013 .