Subgradient Langevin Methods for Sampling from Non-smooth Potentials

Andreas Habring,Martin Holler,Thomas Pock
2024-05-26
Abstract:This paper is concerned with sampling from probability distributions $\pi$ on $\mathbb{R}^d$ admitting a density of the form $\pi(x) \propto e^{-U(x)}$, where $U(x)=F(x)+G(Kx)$ with $K$ being a linear operator and $G$ being non-differentiable. Two different methods are proposed, both employing a subgradient step with respect to $G\circ K$, but, depending on the regularity of $F$, either an explicit or an implicit gradient step with respect to $F$ can be implemented. For both methods, non-asymptotic convergence proofs are provided, with improved convergence results for more regular $F$. Further, numerical experiments are conducted for simple 2D examples, illustrating the convergence rates, and for examples of Bayesian imaging, showing the practical feasibility of the proposed methods for high dimensional data.
Optimization and Control,Computation
What problem does this paper attempt to address?
This paper attempts to solve the problem of sampling from probability distributions with non - smooth potential \(U(x)\). Specifically, the potential \(U(x)\) can be expressed as \(U(x)=F(x)+G(Kx)\), where \(F(x)\) and \(G(x)\) are both convex and lower - semicontinuous functions, \(K\) is a linear operator, and \(G\) is non - smooth. This type of probability distribution is very common in fields such as Bayesian inference and image processing, especially when dealing with high - dimensional data. ### Main Contributions 1. **Propose two new sampling algorithms**: - **Proximal - subgradient Langevin algorithm (Prox - sub)**: Combines the sub - gradient step of \(G\) and the proximal step of \(F\). - **Gradient - subgradient Langevin algorithm (Grad - sub)**: Selects explicit or implicit gradient steps according to the regularity of \(F\), while using the sub - gradient step of \(G\). 2. **Non - asymptotic convergence results**: - Provides non - asymptotic convergence proofs for the two algorithms under different regularity conditions. - When \(F\) is more regular, provides improved convergence results. 3. **Numerical experiments**: - Verifies the convergence rate in a simple two - dimensional example. - Demonstrates the practical feasibility of these methods on high - dimensional data through the example of Bayesian imaging. ### Background and Motivation In many practical applications, such as mathematical imaging in Bayesian inference, it is necessary to sample from probability distributions of the form \(\pi(x)\propto e^{-U(x)}\). However, when \(U(x)\) contains non - smooth parts, traditional gradient - based methods (such as the Langevin algorithm) are no longer applicable. Therefore, this paper proposes two new algorithms to solve this problem. ### Method Overview - **Proximal - subgradient Langevin algorithm (Prox - sub)**: - Each iteration contains a sub - gradient step of \(G\) and a proximal step of \(F\), and then adds a Gaussian random variable. - Applicable when both \(F\) and \(G\) are non - smooth. - **Gradient - subgradient Langevin algorithm (Grad - sub)**: - Selects explicit or implicit gradient steps according to the regularity of \(F\), while using the sub - gradient step of \(G\). - Applicable when \(F\) has stronger regularity. ### Convergence Analysis - Through non - asymptotic convergence results, it is proved that the algorithm reaches the required accuracy within a finite number of steps. - Provides a detailed theoretical analysis, including the estimation of the free - energy functional \(F(\mu)\). ### Numerical Experiments - Verifies the convergence rate of the algorithm in a two - dimensional example. - Demonstrates the practical application effect of the algorithm on high - dimensional data through the example of Bayesian imaging. ### Summary The Proximal - subgradient Langevin algorithm and the Gradient - subgradient Langevin algorithm proposed in this paper provide a new and effective method for sampling from probability distributions with non - smooth potential. These methods not only have good convergence in theory, but also perform well in practical applications.