Analysis of estimating the Bayes rule for Gaussian mixture models with a specified missing-data mechanism

Ziyang Lyu
DOI: https://doi.org/10.1007/s00180-023-01447-0
IF: 1.4049
2024-02-11
Computational Statistics
Abstract:Semi-supervised learning approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan (Stat Comput 30:1–12, 2020). We show that in a partially classified sample, a classifier using Bayes' rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.
statistics & probability
What problem does this paper attempt to address?