Nearest Neighbor-Based Instance Selection for Classification

Guanghua Yu,Jin Tian,Minqiang Li
DOI: https://doi.org/10.1109/fskd.2016.7603154
2016-01-01
Abstract:With the increasing size of big data, classifiers usually suffer from intractable computing and storage issues. Moreover, decision boundaries in complex classification problems are usually complicated and circuitous. Modeling on too many instances can sometimes cause oversensitivity to noise and degrade the learning accuracies. Instance selection offers an effective way to improve classification performance based on partial but significant data. This paper presents a novel instance selection algorithm based on nearest enemy information. The dataset is divided into several partitions corresponding to instances' nearest enemies. In every partition, representative instances are selected based on the distribution information to represent both sides of decision boundary. A support vector machine (SVM) is then adopted to conduct the classification model based on these representative instances. Experimental results illustrate that the proposed algorithm outperforms some conventional instance selection methods with higher classification accuracy and smaller size of selected instances.
What problem does this paper attempt to address?