L G ] 1 7 Ju l 2 01 9 tk-means : A k-means Variant with Robustness and Stability

Yang Zhang,Qingtao Tang,Yiming Li,Weipeng Huang,Shutao Xia
2019-01-01
Abstract:Lloyd’s k-means algorithm is one of the most classical clustering method, which is widely used in data mining or as a data pre-processing procedure. However, due to the thin-tailed property of the Gaussian distribution, k-means suffers from relatively poor performance on the heavy-tailed data or outliers. In addition, k-means have a relatively weak stability, i.e. its result has a large variance, which reduces the credibility of the model. In this paper, we propose a robust and stable k-means variant, the t-k-means, as well as its fast version in solving the flat clustering problem. Theoretically, we detail the derivations of t-k-means and analyze its robustness and stability from the aspect of loss function, influence function and the expression of clustering center. A large number of experiments are conducted, which empirically demonstrates that our method has empirical soundness while preserving running efficiency.
What problem does this paper attempt to address?