NearCount: Selecting critical instances based on the cited counts of nearest neighbors

Zonghai Zhu,Zhe Wang,Dongdong Li,Wenli Du
DOI: https://doi.org/10.1016/j.knosys.2019.105196
IF: 8.139
2020-01-01
Knowledge-Based Systems
Abstract:Traditional instance selection algorithms are not good at addressing imbalanced problems. Moreover, most of them are sensitive to noise instances and suffer from complex selection rules. To solve these problems, in this paper, we propose a concise learning framework named NearCount to determine the importance of the instance without editing noise. In NearCount, the importance of an instance corresponds to the cited counts. The count is determined by the number of times that one instance is selected as a nearest neighbor of instances in different classes. For the instances with nonzero cited counts, the importance of the instance is inversely proportional to the cited count. To handle classification problems with different data distributions, two detailed NearCount-based algorithms – NearCount-IM and NearCount-IS – are introduced. For imbalanced problems, NearCount-IM selects the important majority instances with an equal number of minority instances, thus balancing the data distribution. For balanced scenarios, NearCount-IS selects the instances whose cited counts are greater than zero and equal or less than the number of nearest neighbors as critical instances in every class. The proposed NearCount-IM and NearCount-IS algorithms are evaluated by comparing them with classical instance selection algorithms on the benchmark data sets. Experiments validate the effectiveness of the proposed algorithms.
What problem does this paper attempt to address?