Automatic Recommendation of a Distance Measure for Clustering Algorithms

Xiaoyan Zhu,Yingbin Li,Jiayin Wang,Tian Zheng,Jingwen Fu
DOI: https://doi.org/10.1145/3418228
IF: 4.157
2020-01-01
ACM Transactions on Knowledge Discovery from Data
Abstract:With a large number of distance measures, the appropriate choice for clustering a given data set with a specified clustering algorithm becomes an important problem. In this article, an automatic distance measure recommendation method for clustering algorithms is proposed. The recommendation method consists of the following steps: (1) metadata extraction, including meta-feature collection and meta-target identification; (2) recommendation model construction using metadata; and (3) distance measure recommendation for a new data set by the recommendation model. Two different types of meta-targets and meta-learning techniques are utilized considering the possible different requirements of users. To validate the necessity and effectiveness of the distance measure recommendation method, an empirical study is conducted with 199 publicly available data sets, 9 distance measures, and 2 widely used clustering algorithms. The experimental results indicate that distance measure significantly influences the performance of the clustering algorithm for a given data set. Furthermore, performance analysis of the proposed recommendation method proves its effectiveness.
What problem does this paper attempt to address?