Online ensemble learning algorithm for imbalanced data stream

Hongle Du,Yan Zhang,Ke Gang,Lin Zhang,Yeh-Cheng Chen
DOI: https://doi.org/10.1016/j.asoc.2021.107378
IF: 8.7
2021-08-01
Applied Soft Computing
Abstract:<p>In many practical applications, due to the inability to collect complete training data sets at one time, the adaptability of the classifier is poor. Online ensemble learning can better solve this problem. However, most of the data streams are imbalanced. Imbalanced data stream will greatly affect the performance of online ensemble learning algorithm. To reduce the impact of imbalanced data stream, this paper proposes a cost sensitive online ensemble learning algorithm for imbalanced data stream. The algorithm uses a variety of equalization methods, mainly including the construction of initial base-classifier, dynamic calculation of misclassification cost, sampling method of samples in data stream and calculation of weight of base-classifier. Those methods can reduce the influence of imbalanced data stream and improve the classification performance under imbalanced data stream. The experimental results show that the performance of the proposed algorithm has the better classification performance for imbalanced data stream. Finally, the algorithm is applied to the network intrusion detection, and the simulation experiment on NSL-KDD data set can reduce the missing alarm rate and the false alarm rate. The experimental results show that the algorithm can improve the detection accuracy, especially the recognition rate of unknown intrusion behavior.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issue of performance degradation in online ensemble learning algorithms when dealing with imbalanced data streams. Specifically: 1. **Impact of Imbalanced Data Streams**: In many real-world applications, it is not possible to collect a complete training dataset at once, leading to poor adaptability of classifiers. Online ensemble learning can effectively address this issue, but most data streams are imbalanced, which severely affects the performance of online ensemble learning algorithms. 2. **Proposing a Cost-Sensitive Online Ensemble Learning Algorithm**: To mitigate the impact of imbalanced data streams, the paper proposes a cost-sensitive online ensemble learning algorithm. This algorithm employs various balancing methods, including constructing initial base classifiers, dynamically calculating misclassification costs, sampling methods in data streams, and calculating the weights of base classifiers. These methods can reduce the impact of imbalanced data streams and improve classification performance under imbalanced data streams. 3. **Application in Network Intrusion Detection**: Finally, the algorithm is applied to network intrusion detection. Simulation experiments on the NSL-KDD dataset show that the algorithm can reduce the false negative rate and false positive rate, improve detection accuracy, and particularly enhance the recognition rate of unknown intrusion behaviors. In summary, this paper proposes a new online ensemble learning algorithm to improve classification performance under imbalanced data streams and validates its effectiveness in network intrusion detection tasks.