Imbalanced Data Classification Method Based on Ensemble Learning

Yu Xiang,Yongping Xie
DOI: https://doi.org/10.1007/978-981-13-6508-9_3
2019-06-14
Abstract:Imbalanced data classification is one of the problems that emerged when classifier learning algorithms used in the worlds of business and industry. This paper proposes the methodology to improve the performance of imbalanced data classification. We balance data sets by using synthetic minority oversampling technique (SMOTE); noise generated by new data sets is eliminated by Tomek links (T-Links), support vector machine (SVM), k-nearest neighbor (KNN), and logistic regression (LR) which are selected as the base classifiers to improve classification by using stacked generalization, and the final result is generated by weighted voting. In the experiments, six UCI datasets are tested, and the experimental results show that the method is highly representative and can effectively improve the classification ability.
What problem does this paper attempt to address?