Stochastic Approximation with Biased MCMC for Expectation Maximization

Samuel Gruffaz,Kyurae Kim,Alain Oliviero Durmus,Jacob R. Gardner
2024-02-28
Abstract:The expectation maximization (EM) algorithm is a widespread method for empirical Bayesian inference, but its expectation step (E-step) is often intractable. Employing a stochastic approximation scheme with Markov chain Monte Carlo (MCMC) can circumvent this issue, resulting in an algorithm known as MCMC-SAEM. While theoretical guarantees for MCMC-SAEM have previously been established, these results are restricted to the case where asymptotically unbiased MCMC algorithms are used. In practice, MCMC-SAEM is often run with asymptotically biased MCMC, for which the consequences are theoretically less understood. In this work, we fill this gap by analyzing the asymptotics and non-asymptotics of SAEM with biased MCMC steps, particularly the effect of bias. We also provide numerical experiments comparing the Metropolis-adjusted Langevin algorithm (MALA), which is asymptotically unbiased, and the unadjusted Langevin algorithm (ULA), which is asymptotically biased, on synthetic and real datasets. Experimental results show that ULA is more stable with respect to the choice of Langevin stepsize and can sometimes result in faster convergence.
Computation,Machine Learning,Optimization and Control
What problem does this paper attempt to address?
This paper discusses the use of the Stochastic Approximation Expectation-Maximization (SAEM) algorithm with biased Markov Chain Monte Carlo (MCMC) methods when the Expectation step (E-step) is not feasible in the Expectation Maximization (EM) algorithm. The EM algorithm is commonly used in Bayesian inference, but it can be difficult to compute the E-step when dealing with observed data with complex latent structures. MCMC-SAEM solves this problem by numerically approximating the integral in the E-step, but commonly used methods such as Metropolis-Adjusted Langevin Algorithm (MALA) and Unadjusted Langevin Algorithm (ULA) perform poorly in high dimensions and require careful tuning. The main contribution of this paper is the analysis of the behavior of biased MCMC in SAEM in both asymptotic and non-asymptotic cases. For asymptotic analysis, the authors extend the analysis of stochastic gradient optimization, control the problems caused by the bias of MCMC, and prove that even with bias, MCMC-SAEM can converge to local maxima. Non-asymptotic analysis provides high probability guarantees that under certain conditions, SAEM with biased MCMC can converge within a finite number of steps. The experimental section compares the performance of MALA and ULA on synthetic and real data, and the results show that ULA is more stable in selecting Langevin step sizes and sometimes converges faster. This indicates that despite the bias in the asymptotic distribution of ULA, its advantages in mixing rate may offset this drawback in practice. Overall, the paper fills the gap between theory and practice, provides a deep understanding of using biased MCMC for the EM algorithm, and proposes algorithm performance comparisons suitable for practical applications.