HMC and underdamped Langevin united in the unadjusted convex smooth case

Nicolaï Gouraud,Pierre Le Bris,Adrien Majka,Pierre Monmarché
2024-05-22
Abstract:We consider a family of unadjusted generalized HMC samplers, which includes standard position HMC samplers and discretizations of the underdamped Langevin process. A detailed analysis and optimization of the parameters is conducted in the Gaussian case, which shows an improvement from $1/\kappa$ to $1/\sqrt{\kappa}$ for the convergence rate in terms of the condition number $\kappa$ by using partial velocity refreshment, with respect to classical full refreshments. A similar effect is observed empirically for two related algorithms, namely Metropolis-adjusted gHMC and kinetic piecewise-deterministic Markov processes. Then, a stochastic gradient version of the samplers is considered, for which dimension-free convergence rates are established for log-concave smooth targets over a large range of parameters, gathering in a unified framework previous results on position HMC and underdamped Langevin and extending them to HMC with inertia.
Probability,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to optimize parameters to improve sampling efficiency in Markov Chain Monte Carlo (MCMC) samplers, especially in a class of unadjusted generalized Hamiltonian Monte Carlo (gHMC) chains. Specifically, the paper focuses on improving the convergence rate of the sampler through the partial velocity refreshment mechanism (i.e., using partial momentum updates instead of full momentum updates), especially in the case of a large condition number. ### Main research questions 1. **Parameter optimization in the case of Gaussian distribution**: - The paper first analyzes in detail the selection of parameters \( \delta \) (time step), \( K \) (number of integration steps), and \( \eta \) (damping parameter) of the gHMC sampler in the case of a Gaussian target distribution. - The author finds that for a given condition number \( \kappa = L/m \), the convergence rate can be improved from \( 1/\kappa \) to \( 1/\sqrt{\kappa} \) through the partial velocity refreshment mechanism. - The specific optimal parameter selection is as follows: - For a small relative tolerance \( \epsilon' \), the optimal parameters are: \[ \delta' = \sqrt{8\epsilon'}, \quad K = \left\lfloor \frac{\pi}{\delta'} \left(1 + \frac{1}{\sqrt{\kappa}}\right) \right\rfloor, \quad \eta = \frac{1 - \sin\left(\frac{\pi}{1 + \sqrt{\kappa}}\right)}{\cos\left(\frac{\pi}{1 + \sqrt{\kappa}}\right)} \] - This selection gives the convergence rate: \[ \rho \approx \frac{\delta' \left(1 + \frac{1}{\sqrt{\kappa}}\right)}{\pi \ln\left(\frac{\cos\left(\frac{\pi}{1 + \sqrt{\kappa}}\right)}{1 - \sin\left(\frac{\pi}{1 + \sqrt{\kappa}}\right)}\right)} \approx \frac{\delta'}{\sqrt{\kappa}} \] 2. **Dimension - free non - asymptotic Wasserstein convergence results in the general strongly convex and smooth case**: - The paper further considers the case of a general strongly convex and smooth objective function, that is, \( 0 < mI_d \leq \nabla^2 U \leq LI_d \) but \( \nabla^2 U \) is not necessarily a constant. - The author proves that in this case, the gHMC sampler still has dimension - free Wasserstein convergence results, and such results hold under the parameter ranges \( \delta K \leq T \) and \( 1 - \eta \geq \gamma K \delta \). - The specific convergence result is: \[ W_2(\text{Law}(x_n), \bar{\pi}) \leq C e^{-a n K \delta} W_2(\text{Law}(x_0), \bar{\pi}) + \frac{\delta^2 d + s}{\delta (d + \sigma^2/p)} \] where \( \sigma^2 \) is the variance of a single stochastic gradient realization, and \( W_2 \) represents \( L_2 \) Wasserstein