Imbalanced Data Sets Classification Method Based on Over-Sampling Technique
WANG Chunyu,SU Hongye,QU Yu,CHU Jian
DOI: https://doi.org/10.3778/j.issn.1002-8331.2011.01.038
2011-01-01
Computer Engineering and Applications Journal
Abstract:Classification of data with imbalanced class distribution is a research focus on machine learning.In order to resolve the imbalanced problems,especially those of the poor predictive accuracy over the minority class,this paper presents an improved approach,AdaBoost-SVM-OBMS,which is based on a combination of Boosting,an ensemble-based learning algorithm,and an improved over-sampling method based on misclassified samples.In this approach,using support vector machine as base classifier,the misclassified samples are identified during each iteration.Subsequently,they are used to separately generate new samples for the majority and minority classes.The new samples are then added to the original training set to retrain the classification model,which is used to improve the prediction of hard samples.This method is evaluated,in terms of the AUC,F-value,and G-mean,on eight imbalanced data sets.Results indicate that the improved approach produces high prediction in imbalanced data sets.