Under-sampling Method Research in Class-Imbalanced Data

Lin Shuyang,Li Cuihua,Jiang Yi,Lin Chen,Zou Quan
2011-01-01
Journal of Computer Research and Development
Abstract:An under-sampling method is proposed for class-imbalanced data classification,which resolves the classifiers' over fitting phenomenon to enhance classification ability.Using K-Means method to clustering big class set and extract cluster center then merge with small class sample to generate a balanced sample set for classifiers' training.To avoid the minimum class is too small then purely using clustering under-sampling would lead the training set's sample be excessively sparse,using SMOTE over-sampling algorithm combine cluster under-sampling algorithm not only avoid induce too much noise but also resolve the problem of shortage of sample.Six groups testing data and five groups' biological information experiment certificate it validity on class-imbalanced data set.
What problem does this paper attempt to address?