Detecting threshold parameters using entropy analysis in density peaks clustering

Weng Yu,Gong Wendong,Yuan Jie
DOI: https://doi.org/10.1145/3194206.3194221
2018-01-01
Abstract:The density peak clustering algorithm is a data clustering algorithm with clustering by fast search and finds of density peaks (DPC) as clustering centers. Then each remaining point is assigned to the same cluster as its nearest neighbor of higher density. The density peak clustering could classify elements into clusters very well, but the algorithm requires manual intervention and decision making to select clustering centers. Compared with the traditional K-means algorithm, it is not necessary to input the number of clustering thresholds, but it is necessary to input the distance threshold and the value of clustering border thresholds. This means it is necessary for a human to intervene during clustering by selecting some thresholds. This paper proposed a method based on information entropy backtracking enumeration to automatically discover possible distance thresholds. It doesn't need to intervene manually in the clustering process. This idea is mainly based on the fact that the information entropy is relatively stable in a relatively stable clustering state. Then predict and recommend some clustering results automatically. The paper found that it is easier to obtain clustering results as good as the artificial threshold selection method on several test case.
What problem does this paper attempt to address?