Abstract:The expectation maximization (EM) algorithm is a widespread method for empirical Bayesian inference, but its expectation step (E-step) is often intractable. Employing a stochastic approximation scheme with Markov chain Monte Carlo (MCMC) can circumvent this issue, resulting in an algorithm known as MCMC-SAEM. While theoretical guarantees for MCMC-SAEM have previously been established, these results are restricted to the case where asymptotically unbiased MCMC algorithms are used. In practice, MCMC-SAEM is often run with asymptotically biased MCMC, for which the consequences are theoretically less understood. In this work, we fill this gap by analyzing the asymptotics and non-asymptotics of SAEM with biased MCMC steps, particularly the effect of bias. We also provide numerical experiments comparing the Metropolis-adjusted Langevin algorithm (MALA), which is asymptotically unbiased, and the unadjusted Langevin algorithm (ULA), which is asymptotically biased, on synthetic and real datasets. Experimental results show that ULA is more stable with respect to the choice of Langevin stepsize and can sometimes result in faster convergence.

What problem does this paper attempt to address?

This paper discusses the use of the Stochastic Approximation Expectation-Maximization (SAEM) algorithm with biased Markov Chain Monte Carlo (MCMC) methods when the Expectation step (E-step) is not feasible in the Expectation Maximization (EM) algorithm. The EM algorithm is commonly used in Bayesian inference, but it can be difficult to compute the E-step when dealing with observed data with complex latent structures. MCMC-SAEM solves this problem by numerically approximating the integral in the E-step, but commonly used methods such as Metropolis-Adjusted Langevin Algorithm (MALA) and Unadjusted Langevin Algorithm (ULA) perform poorly in high dimensions and require careful tuning. The main contribution of this paper is the analysis of the behavior of biased MCMC in SAEM in both asymptotic and non-asymptotic cases. For asymptotic analysis, the authors extend the analysis of stochastic gradient optimization, control the problems caused by the bias of MCMC, and prove that even with bias, MCMC-SAEM can converge to local maxima. Non-asymptotic analysis provides high probability guarantees that under certain conditions, SAEM with biased MCMC can converge within a finite number of steps. The experimental section compares the performance of MALA and ULA on synthetic and real data, and the results show that ULA is more stable in selecting Langevin step sizes and sometimes converges faster. This indicates that despite the bias in the asymptotic distribution of ULA, its advantages in mixing rate may offset this drawback in practice. Overall, the paper fills the gap between theory and practice, provides a deep understanding of using biased MCMC for the EM algorithm, and proposes algorithm performance comparisons suitable for practical applications.

Stochastic Approximation with Biased MCMC for Expectation Maximization

Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM

Stochastic Expectation Maximization with Variance Reduction.

Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation

AdamMCMC: Combining Metropolis Adjusted Langevin with Momentum-based Optimization

Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods

Stochastic Gradient Descent as Approximate Bayesian Inference

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models.

Convergent stochastic Expectation Maximization algorithm with efficient sampling in high dimension. Application to deformable template model estimation

Stochastic Subgradient MCMC Methods.

Stochastic Gradient Markov Chain Monte Carlo

An Effective EM Algorithm for Mixtures of Gaussian Processes Via the MCMC Sampling and Approximation.

Convergence of Expectation-Maximization Algorithm With Mixed-Integer Optimization

Uncertainty Computation at Finite Distance in Nonlinear Mixed Effects Models—a New Method Based on Metropolis-Hastings Algorithm

Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise

On the Behavior of the Expectation-Maximization Algorithm for Mixture Models

Unbiased Kinetic Langevin Monte Carlo with Inexact Gradients

Unbiased Markov chain quasi-Monte Carlo for Gibbs samplers

The Basic Idea behind Expectation-Maximization