On the convergence of dynamic implementations of Hamiltonian Monte Carlo and No U-Turn Samplers

Alain Durmus,Samuel Gruffaz,Miika Kailas,Eero Saksman,Matti Vihola
2024-10-18
Abstract:There is substantial empirical evidence about the success of dynamic implementations of Hamiltonian Monte Carlo (HMC), such as the No U-Turn Sampler (NUTS), in many challenging inference problems but theoretical results about their behavior are scarce. The aim of this paper is to fill this gap. More precisely, we consider a general class of MCMC algorithms we call dynamic HMC. We show that this general framework encompasses NUTS as a particular case, implying the invariance of the target distribution as a by-product. Second, we establish conditions under which NUTS is irreducible and aperiodic and as a corrolary ergodic. Under conditions similar to the ones existing for HMC, we also show that NUTS is geometrically ergodic. Finally, we improve existing convergence results for HMC showing that this method is ergodic without any boundedness condition on the stepsize and the number of leapfrog steps, in the case where the target is a perturbation of a Gaussian distribution.
Computation,Probability,Statistics Theory,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the theoretical behavior analysis of the dynamically implemented Hamiltonian Monte Carlo (HMC) algorithm and its variants, such as the No U - Turn Sampler (NUTS). Although these algorithms have shown remarkable success in many challenging inference problems, theoretical results regarding their behavior are relatively scarce. Specifically, the paper aims to fill this gap by providing theoretical guarantees for the NUTS algorithm, including its irreducibility, aperiodicity, and geometric ergodicity. ### Main contributions: 1. **Proposing a general framework**: The paper introduces a general framework for dynamic HMC algorithms including NUTS and proves that the algorithms under this framework satisfy the conditions for invariance of the target distribution. 2. **Irreducibility and aperiodicity**: The paper proves the irreducibility and aperiodicity of the currently Stan - implemented NUTS algorithm, which are necessary conditions for the ergodicity of this algorithm. 3. **Geometric ergodicity**: Under conditions similar to those of HMC, the paper proves the geometric ergodicity of NUTS. 4. **Improving existing convergence results**: The paper also improves the existing HMC convergence results and proves that when the target distribution is a perturbation of a Gaussian distribution, the HMC method is ergodic without limitations on step size and number of jump steps. ### Background: - **HMC algorithm**: HMC is a Metropolis - Hastings algorithm for sampling from a target probability density \(\pi\). This method generates proposals far from the starting position by solving the Hamiltonian equations, thus avoiding the random walk behavior in most MCMC algorithms. - **NUTS algorithm**: NUTS is a dynamic implementation of HMC. It optimizes performance by automatically selecting the number of jump steps \(T\) and using an adaptive mechanism (such as the dual averaging method) to adjust the step size \(h\). ### Research motivation: - **Theoretical gap**: Although NUTS performs excellently in practical applications, relatively little research has been done on its theoretical properties. The paper aims to provide theoretical support for the performance of NUTS through rigorous mathematical analysis. - **Technical challenges**: The construction of NUTS makes traditional regularity conditions inapplicable, so new proof strategies are required, combining global information to control the Hamiltonian trajectory. ### Conclusion: By introducing a general framework, the paper proves the irreducibility, aperiodicity, and geometric ergodicity of NUTS, providing an important basis for understanding the theoretical properties of NUTS. These results are not only applicable to NUTS but also provide a reference for the theoretical analysis of other dynamic HMC algorithms.