Convergence Of The Unadjusted Langevin Algorithm For Discontinuous Gradients

Tim Johnston,Sotirios Sabanis
2023-12-04
Abstract:We demonstrate that for strongly log-convex densities whose potentials are discontinuous on manifolds, the ULA algorithm converges with stepsize bias of order $1/2$ in Wasserstein-p distance. Our resulting bound is then of the same order as the convergence of ULA for gradient Lipschitz potential. Additionally, we show that so long as the gradient of the potential obeys a growth bound (therefore imposing no regularity condition), the algorithm has stepsize bias of order $1/4$. We therefore unite two active areas of research: i) the study of numerical methods for SDEs with discontinuous coefficients and ii) the study of the non-asymptotic bias of the ULA algorithm (and variants). In particular this is the first result of the former kind we are aware of on an unbounded time interval.
Probability
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the convergence properties of the Unadjusted Langevin Algorithm (ULA) under strongly log - convex density functions with discontinuous gradients. Specifically, the authors explore the order of step - size bias of the ULA algorithm in Wasserstein - p distance under these conditions and prove that even when the gradient is discontinuous, the algorithm can still converge at a specific order. In addition, they also study the order of step - size bias of the algorithm when the gradient satisfies the growth condition but has no regularity requirement. ### Core Problems of the Paper 1. **Convergence under Strongly Log - Convex Density Functions** - The authors prove that for a strongly log - convex density function \( U \), when its potential function is discontinuous on some manifolds, the order of step - size bias of the ULA algorithm in Wasserstein - p distance is \( O(\gamma^{1/2}) \). - This result is the same as the convergence of the ULA algorithm under the gradient Lipschitz condition. 2. **The Case Where the Gradient Only Satisfies the Growth Condition** - When the gradient only satisfies the linear growth condition without regularity requirements, the authors prove that the order of step - size bias of the ULA algorithm in Wasserstein - p distance is \( O(\gamma^{1/4}) \). - This shows that even when the gradient is very irregular, the ULA algorithm still has a certain degree of robustness. ### Mathematical Expressions - **Strongly Log - Convex Density Function** \[ \langle \nabla U(x)-\nabla U(y), x - y\rangle\geq\mu\|x - y\|^2,\quad\forall x, y\in\mathbb{R}^d \] where \(\mu > 0\) is the strong monotonicity constant. - **Convergence in Wasserstein - p Distance** - For the case of discontinuous gradients: \[ W_p(\pi_\beta, L(x_n))\leq W_p(\xi,\pi_\beta)e^{-\mu\gamma n}+c\gamma^{1/2} \] - For the case where the gradient only satisfies the growth condition: \[ W_p(\pi_\beta, L(x_n))\leq W_p(\xi,\pi_\beta)e^{-\mu\gamma n}+c\gamma^{1/4} \] ### Main Contributions 1. **Unifying Two Research Directions** - This paper combines two main directions in numerical method research: one is the method for dealing with discontinuous coefficients of SDEs, and the other is the study of non - asymptotic bias. - The authors are the first to prove the convergence of the ULA algorithm under discontinuous gradient conditions in an unbounded time interval. 2. **Technical Challenges** - The authors overcome the challenges of the unbounded time interval by introducing an appropriate exponentially weighted difference process and applying some key lemmas (such as the local time formula). 3. **Theoretical Significance** - The results of this paper not only extend the existing theory but also provide a theoretical basis for dealing with more complex and practical optimization and sampling problems. ### Conclusion This paper fills the gap in the performance analysis of the algorithm in the case of discontinuous gradients by conducting an in - depth study of the convergence of the ULA algorithm under strongly log - convex density functions. These results are of great significance for understanding and improving sampling and optimization algorithms based on Langevin dynamics.