New criteria for evaluating the validity of clustering
Juanying XIE,Ying ZHOU,Mingzhao WANG,Weiliang JIANG
DOI: https://doi.org/10.11992/tis.201706029
2018-01-01
Abstract:There are two kinds of criteria for evaluating the clustering ability of a clustering algorithm, internal and ex-ternal. The current external evaluation indexes fails to consider the skewed clustering result; it is difficult to get optim-um cluster numbers from the clustering validity inspection results from the internal evaluation indexes. Considering the defects in the present internal and external clustering evaluation indices, we propose two external evaluation indexes, which consider both positive and negative information and which are respectively based on the contingency table and sample pairs for the evaluation of clustering results from a dataset with arbitrary distribution. The variance is proposed to measure the tightness of a cluster and the separability between clusters, and the ratio of these parameters is used as an internal evaluation index for the measurement index. Experiments on the datesets from UCI (University of California in Iven) machine learning repository and artificially simulated datasets show that the proposed new internal index can be used to effectively find the truenumber of clusters in a dataset. The proposed external indexes based on the contingency table and sample pairs are a very effective external evaluation indexes and can be used to evaluate the clustering results from existing types of skewed and noisy data.
What problem does this paper attempt to address?