Cluster analysis and outlier detection with missing data

Hung Tong,Cristina Tortora
DOI: https://doi.org/10.48550/arXiv.2012.05394
2020-12-10
Abstract:A mixture of multivariate contaminated normal (MCN) distributions is a useful model-based clustering technique to accommodate data sets with mild outliers. However, this model only works when fitted to complete data sets, which is often not the case in real applications. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. We employ the expectation-conditional maximization algorithm for parameter estimation. We use a simulation study to compare the results of our model and a mixture of Student's t distributions for incomplete data.
Methodology
What problem does this paper attempt to address?