Improving the convergence of Markov chains via permutations and projections

Michael C.H. Choi,Max Hird,Youjia Wang
2024-11-13
Abstract:This paper aims at improving the convergence to equilibrium of finite ergodic Markov chains via permutations and projections. First, we prove that a specific mixture of permuted Markov chains arises naturally as a projection under the KL divergence or the squared-Frobenius norm. We then compare various mixing properties of the mixture with other competing Markov chain samplers and demonstrate that it enjoys improved convergence. This geometric perspective motivates us to propose samplers based on alternating projections to combine different permutations and to analyze their rate of convergence. We give necessary, and under some additional assumptions also sufficient, conditions for the projection to achieve stationarity in the limit in terms of the trace of the transition matrix. We proceed to discuss tuning strategies of the projection samplers when these permutations are viewed as parameters. Along the way, we reveal connections between the mixture and a Markov chain Sylvester's equation as well as assignment problems, and highlight how these can be used to understand and improve Markov chain mixing. We provide two examples as illustrations. In the first example, the projection sampler (with a suitable choice of the permutation) improves upon Metropolis-Hastings in a discrete bimodal distribution with a reduced relaxation time from exponential to polynomial in the system size, while in the second example, the mixture of permuted Markov chain yields a mixing time that is logarithmic in system size (with high probability under random permutation), compared to a linear mixing time in the Diaconis-Holmes-Neal sampler.
Probability,Optimization and Control,Computation
What problem does this paper attempt to address?
This paper aims to solve the problem of the convergence rate of finite - traversal Markov chains towards the equilibrium state. Specifically, the authors improve the convergence performance of Markov chains through the methods of permutations and projections. ### Main problem description in the paper In many applications, such as statistical physics, machine learning, etc., the Markov Chain Monte Carlo (MCMC) method is widely used to sample complex probability distributions. However, the convergence rate of these chains is often slow, especially in high - dimensional or complex - structured spaces. Therefore, accelerating the convergence of Markov chains has become an important issue. ### Main contributions 1. **A new perspective of permutations and projections**: - The authors prove that a specific mixed - permutation Markov chain can be naturally regarded as a projection under the KL - divergence or the squared Frobenius norm. - A sampler based on alternating projections is proposed, and their convergence rates are analyzed. 2. **The superiority of the mixed - permutation Markov chain**: - By comparing different mixing parameters (such as the Dobrushin coefficient, asymptotic variance, spectral gap, and average hitting time), the superior performance of the mixed - permutation Markov chain on these indicators is demonstrated. 3. **Combination of theory and practical applications**: - Theoretically, the necessary conditions for reaching the stationary state are given, and under certain assumptions, the sufficient conditions are also given. - In practical applications, the effectiveness of the new method is illustrated through two examples: in an example of a discrete bimodal distribution, the new sampler reduces the relaxation time from exponential to polynomial; in another example, the mixing time of the mixed - permutation Markov chain logarithmically depends on the system size, while the traditional method is linear. ### Mathematical formula representation The key formulas involved in the paper include: - The expression of the mixed - permutation Markov chain: \[ P(Q)=\frac{1}{2}(P + QP^{*}Q) \] where \(P\) is the original transition matrix, \(Q\) is the permutation matrix, and \(P^{*}\) is the time - reversed or \(\ell_{2}(\pi)\)-adjoint matrix of \(P\). - The deformation related to KL - divergence: \[ D_{\text{Q - left - deformed}}^{\text{KL}}(P \| L)=\sum_{x}\pi(x)\sum_{y}QP(x, y)\ln\left(\frac{QP(x, y)}{QL(x, y)}\right) \] - Comparison of spectral parameters: \[ \lambda(QPQ)=\lambda(P) \] These formulas show how the new method theoretically improves the convergence of Markov chains through mathematical tools such as KL - divergence and spectral analysis. ### Summary This paper proposes a new method to improve the convergence rate of Markov chains by introducing permutation and projection techniques, and verifies its effectiveness through theoretical analysis and practical examples. This method is not only of great significance in theory, but also provides an effective tool for practical applications.