An Approach to Sample Selection from Big Data for Classification

Sheng Xing,Yulin He,Hong Zhu,Xizhao Wang
DOI: https://doi.org/10.1109/smc.2016.7844685
2016-01-01
Abstract:When traditional sample selection methods are used to compress large data sets, the computational complexity turns out to be very high and it is really time consuming. To avoid these shortcomings, we propose a new method to select samples based on non-stable cut points. With the basic characteristic of convex function that its extreme values occur at the endpoints of intervals, the method measures the extent of a sample being endpoints by labeling non-stable cut points. Then we can select the samples with higher endpoint extent, which can avoid calculating the distances between samples. This method aims to compress the data sets and improve the computational efficiency without affecting the classification accuracy. Experiments show that the proposed algorithm performs very well on the compression of data sets with higher imbalance degree. Meanwhile, the method is experimentally confirmed to have strong noise-resistance.
What problem does this paper attempt to address?