Small World MCMC with Tempering: Ergodicity and Spectral Gap

Yongtao Guan,Matthew Stephens
DOI: https://doi.org/10.48550/arXiv.1211.4675
2012-11-20
Methodology
Abstract:When sampling a multi-modal distribution $\pi(x)$, $x\in \rr^d$, a Markov chain with local proposals is often slowly mixing; while a Small-World sampler \citep{guankrone} -- a Markov chain that uses a mixture of local and long-range proposals -- is fast mixing. However, a Small-World sampler suffers from the curse of dimensionality because its spectral gap depends on the volume of each mode. We present a new sampler that combines tempering, Small-World sampling, and producing long-range proposals from samples in companion chains (e.g. Equi-Energy sampler). In its simplest form the sampler employs two Small-World chains: an exploring chain and a sampling chain. The exploring chain samples $\pi_t(x) \propto \pi(x)^{1/t}$, $t\in [1,\infty)$, and builds up an empirical distribution. Using this empirical distribution as its long-range proposal, the sampling chain is designed to have a stationary distribution $\pi(x)$. We prove ergodicity of the algorithm and study its convergence rate. We show that the spectral gap of the exploring chain is enlarged by a factor of $t^{d}$ and that of the sampling chain is shrunk by a factor of $t^{-d}$. Importantly, the spectral gap of the exploring chain depends on the "size" of $\pi_t(x)$ while that of sampling chain does not. Overall, the sampler enlarges a severe bottleneck at the cost of shrinking a mild one, hence achieves faster mixing. The penalty on the spectral gap of the sampling chain can be significantly alleviated when extending the algorithm to multiple chains whose temperatures $\{t_k\}$ follow a geometric progression. If we allow $t_k \rightarrow 0$, the sampler becomes a global optimizer.
What problem does this paper attempt to address?