Abstract:Expectation-Maximization (EM) algorithm is a widely used iterative algorithm for computing (local) maximum likelihood estimate (MLE). It can be used in an extensive range of problems, including the clustering of data based on the Gaussian mixture model (GMM). Numerical instability and convergence problems may arise in situations where the sample size is not much larger than the data dimensionality. In such low sample support (LSS) settings, the covariance matrix update in the EM-GMM algorithm may become singular or poorly conditioned, causing the algorithm to crash. On the other hand, in many signal processing problems, a priori information can be available indicating certain structures for different cluster covariance matrices. In this paper, we present a regularized EM algorithm for GMM-s that can make efficient use of such prior knowledge as well as cope with LSS situations. The method aims to maximize a penalized GMM likelihood where regularized estimation may be used to ensure positive definiteness of covariance matrix updates and shrink the estimators towards some structured target covariance matrices. We show that the theoretical guarantees of convergence hold, leading to better performing EM algorithm for structured covariance matrix models or with low sample settings.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the numerical instability and convergence problems encountered when using the Gaussian Mixture Model (GMM) for clustering in the case of Low Sample Support (LSS). Specifically, when the number of samples is much smaller than the data dimension, the covariance matrix update in the EM algorithm may become singular or ill - conditioned, causing the algorithm to fail. In addition, the paper also considers the situation in signal processing problems where prior information can indicate that the covariance matrices of different clusters have a certain structure. Therefore, the paper proposes a regularized EM algorithm, aiming to utilize this prior knowledge by introducing a penalty term and improve the algorithm performance in the case of low sample support. To achieve this goal, the paper proposes a method to maximize the GMM likelihood function with a penalty, where the regularization estimate is used to ensure the positive definiteness of the covariance matrix update and shrink the estimated value towards the preset target covariance matrix. This method not only solves the numerical stability problem but also improves the clustering accuracy in the low - sample - support setting. The paper also proves the theoretical convergence guarantee and verifies the effectiveness of the results through simulation experiments.

Regularized EM algorithm

Regularized EM Algorithms: A Unified Framework and Statistical Guarantees

Maximum-Likelihood-Estimation Via the Ecm Algorithm - A General Framework

Maximum likelihood estimation via the ECM algorithm: A general framework

On Convergence Properties of the EM Algorithm for Gaussian Mixtures.

Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference

Statistical Machine Learning EM algorithm Lecture Notes 13 : EM algorithm

On convergence properties of the EM algorithm for Gaussian mixtures

Statistical analysis for a penalized EM algorithm in high-dimensional mixture linear regression model

Regularized TLS-EM for estimating missing data

A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models

Statistical Convergence of the EM Algorithm on Gaussian Mixture Models

The expectation-maximization algorithm for ill-posed integral equations: a convergence analysis

Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models

A Regularization Scheme Based on Gaussian Mixture Model for EM Data Inversion

The Basic Idea behind Expectation-Maximization

Maximum Likelihood from Incomplete Data Via the EM Algorithm

On the global and componentwise rates of convergence of the EM algorithm

The use of the EM algorithm for regularization problems in high-dimensional linear mixed-effects models

Expectation-maximization for logistic regression

Gaussian mixture modelling by exploiting competitive stop EM algorithm