Instance Selection Based on Redundant Instance Pair Elimination Algorithm

Lu LIU,Qiang GAO,Yan-heng LIU,Xin SUN
DOI: https://doi.org/10.3969/j.issn.1000-3428.2014.01.037
2014-01-01
Abstract:Instance selection is a kind of effective method to remove the noise and redundant data. According to the unbalance between the generalization ability and reduction in existing instance selection methods, this paper proposes a new instance selection method:Redundant Instance Pair Elimination(RIPE) algorithm. It gives the concept of nearest similar pair, calculates the nearest similar pair of datasets, and removes the eligible instances. The simulation experimental results in 11 different datasets show that the classification accuracy and storage compression ratio of processed dataset are obviously improved compared with original datasets. Contrasted with Edited Nearest Neighbor rule(ENN) algorithm, this algorithm can keep the classification accuracy, improve more than 35% in average storage compression ratio, keep intact the data distribution of original datasets, and make better compromise in the classification accuracy and the storage compression ratio.
What problem does this paper attempt to address?