DEA-based Internal Validity Index for Clustering

Jing Zhao,Qingxian An
DOI: https://doi.org/10.1080/01605682.2024.2348621
IF: 3.6
2024-01-01
Journal of the Operational Research Society
Abstract:Internal validity indices are crucial in evaluating the quality of clustering results, serving as valuable tools for comparing various clustering algorithms and determining the optimal number of clusters for datasets. Most existing internal validity indices use the worst-case scenario to represent the overall validity. Moreover, some indices assign equal weights to distances among different clusters, even when these distances might have varying degrees of influence on the overall validity. Data envelopment analysis (DEA) is an effective technique for evaluating the performance of decision-making units through the computation of the ratio of the weighted sum of outputs to the weighted sum of inputs. The weight assigned to each indicator signifies its degree of influence on efficiency. Furthermore, DEA can be viewed as a multiple-criteria evaluation methodology, wherein inputs and outputs are two sets of performance criteria. We propose a DEA-based internal validity index (DEAI) to evaluate the validity of the clustering results. In this approach, intra-cluster compactness and inter-cluster separation are employed for determining the input(s) and output(s). The DEAI is then applied to the artificial datasets and empirical examples. Experimental results illustrate that DEAI outperforms six classic internal validity indices in accurately identifying the optimal cluster across all 10 datasets.
What problem does this paper attempt to address?