A selective ensemble learning algorithm for imbalanced dataset

Du Hongle,Zhang Yan,Ke Gang
DOI: https://doi.org/10.1007/s12652-021-03453-w
IF: 3.662
2021-09-04
Journal of Ambient Intelligence and Humanized Computing
Abstract:Network intrusion behaviour data is the imbalanced data. It includes a large amount of normal behavior data and a small amount of intrusion behavior data. The traditional selective ensemble learning algorithm will lead to high false negative rate. This paper proposes a selective ensemble learning algorithm for imbalanced data based on under sampling (SELAUS). First of all, the algorithm uses Bootstrap method to extract samples equal to the number of samples of a few classes from majority class samples to construct multiple balanced training subsets. Then, in order to ensure that the obtained base classifiers have great differences, several features are randomly selected on the training subset and a decision tree is constructed as the base classifier using CART algorithm. This method can also make some base classifiers have poor performance, so it can select and integrate base classifiers instead of all base classifiers. In order to accurately evaluate the generalization error of the classifier for imbalanced dataset, this paper defines the performance evaluation method in the imbalanced dataset and the difference evaluation method between the base classifiers. Then the generalization error of each base classifier is calculated, and the base classifier is selected according to the generalization error. In the integration of weighted voting, the weight of each base classifier is calculated by the weight calculation method for imbalanced data. Finally, the validity of the algorithm is verified by UCI dataset and applied to network intrusion detection. The simulation results show that the algorithm can improve the detection rate of minority class samples, that is to say, reduce the false negative rate.
computer science, information systems,telecommunications, artificial intelligence
What problem does this paper attempt to address?