Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity

We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion. Our analysis does not require global contractivity, and yields polynomial dependence on the time horizon. Compared to existing approaches, we make an additional smoothness assumption, and improve the existing rate from $O(\eta)$ to $O(\eta^2)$ in terms of the KL divergence. This result matches the correct order for numerical SDEs, without suffering from exponential time dependence. When applied to algorithms for sampling and learning, this result simultaneously improves all those methods based on Dalayan's approach.

[1]  A. Bovier,et al.  Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .

[2]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[3]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[4]  C. Villani,et al.  Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .

[5]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[6]  Santosh S. Vempala,et al.  Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices , 2019, NeurIPS.

[7]  Jian Peng,et al.  Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion , 2019, ICLR.

[8]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[9]  Maxim Raginsky,et al.  Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability , 2018, COLT.

[10]  D. Bakry,et al.  A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[11]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[12]  Andre Wibisono,et al.  Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem , 2018, COLT.

[13]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[14]  Alain Durmus,et al.  Convergence of diffusions and their discretizations: from continuous to discrete processes and back , 2019, 1904.09808.

[15]  A. J. Stam Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon , 1959, Inf. Control..

[16]  B. Davis,et al.  Integral Inequalities for Convex Functions of Operators on Martingales , 2011 .

[17]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[18]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[19]  M. V. Tretyakov,et al.  Stochastic Numerics for Mathematical Physics , 2004, Scientific Computation.

[20]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[21]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[22]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[23]  E. Vanden-Eijnden,et al.  Pathwise accuracy and ergodicity of metropolized integrators for SDEs , 2009, 0905.4218.

[24]  D. Stroock,et al.  Logarithmic Sobolev inequalities and stochastic Ising models , 1987 .

[25]  B. Jourdain,et al.  Optimal transport bounds between the time-marginals of a multidimensional diffusion and its Euler scheme , 2014, 1405.7007.

[26]  Santosh S. Vempala,et al.  Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities , 2018, ArXiv.

[27]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[28]  D. Talay,et al.  Expansion of the global error for numerical schemes solving stochastic differential equations , 1990 .

[29]  Giuseppe Toscani,et al.  Entropy production and the rate of convergence to equilibrium for the Fokker-Planck equation , 1999 .

[30]  Desmond J. Higham,et al.  An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations , 2001, SIAM Rev..

[31]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[32]  A. Veretennikov,et al.  On polynomial mixing bounds for stochastic differential equations , 1997 .

[33]  Michael B. Giles,et al.  Multilevel Monte Carlo method for ergodic SDEs without contractivity , 2018, Journal of Mathematical Analysis and Applications.

[34]  Denis Talay Simulation and numerical analysis of stochastic differential systems : a review , 1990 .

[35]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[36]  Mateusz B. Majka,et al.  Quantitative contraction rates for Markov chains on general state spaces , 2018, Electronic Journal of Probability.

[37]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[39]  Jonathan C. Mattingly,et al.  Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise , 2002 .

[40]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[41]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[42]  C. Villani,et al.  ON THE TREND TO EQUILIBRIUM FOR THE FOKKER-PLANCK EQUATION : AN INTERPLAY BETWEEN PHYSICS AND FUNCTIONAL ANALYSIS , 2004 .

[43]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[44]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[45]  Andrew M. Stuart,et al.  Convergence of Numerical Time-Averaging and Stationary Measures via Poisson Equations , 2009, SIAM J. Numer. Anal..

[46]  Mateusz B. Majka,et al.  Nonasymptotic bounds for sampling algorithms without log-concavity , 2018, The Annals of Applied Probability.

[47]  A. Iserles A First Course in the Numerical Analysis of Differential Equations: Gaussian elimination for sparse linear equations , 2008 .

[48]  Tengyuan Liang,et al.  Statistical inference for the population landscape via moment‐adjusted stochastic gradients , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[49]  M. Talagrand A new isoperimetric inequality and the concentration of measure phenomenon , 1991 .

[50]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[51]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[52]  M. Ledoux,et al.  Logarithmic Sobolev Inequalities , 2014 .

[53]  T. Faniran Numerical Solution of Stochastic Differential Equations , 2015 .

[54]  G. Pavliotis Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations , 2014 .

[55]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[56]  R. Khasminskii Stochastic Stability of Differential Equations , 1980 .

[57]  Arnak S. Dalalyan,et al.  Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets , 2019, J. Mach. Learn. Res..

[58]  Espen Bernton,et al.  Langevin Monte Carlo and JKO splitting , 2018, COLT.

[59]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.