HIBOG: Improving the clustering accuracy by ameliorating dataset with gravitation
Qi Li,Shuliang Wang,Chuanfeng Zhao,Boxiang Zhao,Xin Yue,Jing Geng
DOI: https://doi.org/10.1016/j.ins.2020.10.046
IF: 8.1
2021-03-01
Information Sciences
Abstract:<p>Clustering is an important technology applied in many fields. Most researchers focus on only clustering algorithms when they want more accurate results. However, this is not an optimal strategy because each algorithm has its unique advantages and disadvantages. Furthermore, a given algorithm cannot get satisfactory results on all datasets. In this paper, focusing on datasets, a method called <em>HIBOG</em> is proposed to improve the clustering accuracy by ameliorating datasets with gravitation. <em>HIBOG</em> can help many clustering algorithms acquire better results on more datasets by ameliorating datasets so that similar objects get closer and dissimilar objects separate further apart. As a result, ameliorated datasets are friendlier to many clustering algorithms than original datasets. Though datasets are diverse, <em>HIBOG</em> can cope with the diversity to some extent due to its robustness to high dimensional datasets, Gaussian distribution datasets, shaped datasets, and datasets with high overlap clusters. We have conducted numerous experiments on real-world datasets to verify the effectiveness of <em>HIBOG</em>. The experiments demonstrated that <em>HIBOG</em> successfully improves the accuracy of different clustering algorithms, and accuracy increases by an average of 113.4% (except maximum and minimum). Moreover, compared with other similar methods, <em>HIBOG</em> improves much higher clustering accuracy and dramatically shortens the running time. At the same time, we conducted 360 experiments, each of which selected different parameter values. The experiments show that most values enable <em>HIBOG</em> to ameliorate datasets, and <em>HIBOG</em> has strong robustness to the parameter adjustment.</p>
computer science, information systems