A Comparison Study of Similarity Measures for Covering-Based Neighborhood Classifiers.
Fu-Lun Liu,Ben-Wen Zhang,Davide Ciucci,Wei-Zhi Wu,Fan Min
DOI: https://doi.org/10.1016/j.ins.2018.03.030
IF: 8.1
2018-01-01
Information Sciences
Abstract:In data mining, neighborhood classifiers are valid not only for numeric data but also symbolic data. The key issue for a neighborhood classifier is how to measure the similarity between two instances. In this paper, we compare six similarity measures, Overlap, Eskin, occurrence frequency (OF), inverse OF (IOF), Goodall3, and Goodall4, for symbolic data under the framework of a covering-based neighborhood classifier. In the training stage, a covering of the universe is built based on the given similarity measure. Then a covering reduction algorithm is used to remove some of these covering blocks and determine the representatives. In the testing stage, the similarities between all unlabeled instances and representatives are computed. The closest representative or a few representatives determine the predicted class label of the unlabeled instance. We compared the six similarity measures in experiments on 15 University of California-Irvine (UCI) datasets. The results demonstrate that although no measure dominated the others in all scenarios, some measures had consistently high performance. The covering-based neighborhood classifier with appropriate similarity measures, such as Overlap, IOF, and OF, was better than ID3, C4.5, and the Naive Bayes classifiers. (C) 2018 Elsevier Inc. All rights reserved.