Robust $K$-Means-type Clustering for Noisy Data
Xi Xiao,Hailong Ma,Guojun Gan,Qing Li,Bin Zhang,Shutao Xia
DOI: https://doi.org/10.1109/tnnls.2024.3392211
IF: 14.255
2024-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Data clustering is a fundamental machine learning task that seeks to categorize a dataset into homogeneous groups. However, real data usually contain noise, which poses significant challenges to clustering algorithms. In this article, motivated by how the $k$ -means algorithm is derived from a Gaussian mixture model (GMM), we propose a robust $k$ -means-type algorithm, named $k$ -means-type clustering based on $t$ -distribution (KMTD), by assuming that the data points are drawn from a special multivariate $t$ -mixture model (TMM). Compared to the Gaussian distribution, the $t$ -distribution has a fatter tail. The proposed algorithm is more robust to noise. Like the $k$ -means algorithm, the proposed algorithm is simpler than those based on a full TMM. Both synthetic and actual data are used to illustrate the proposed algorithm’s performance and efficiency. The experimental results demonstrated that the proposed algorithm operates more quickly than other sophisticated algorithms and, in most cases, achieves higher accuracy than the other algorithms.