A network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing
Hao Zhang,Zude Xiao,Jason Gu,Yanhua Liu
DOI: https://doi.org/10.1007/s11227-023-05474-y
IF: 3.3
2023-06-15
The Journal of Supercomputing
Abstract:With the rapid development of network technology, the Internet has brought significant convenience to various sectors of society, holding a prominent position. Due to the unpredictable and severe consequences resulting from malicious attacks, the detection of anomalous network traffic has garnered considerable attention from researchers over the past few decades. Accurately labeling a sufficient amount of network traffic data as a training dataset within a short period of time is a challenging task, given the rapid and massive generation of network traffic data. Furthermore, the proportion of malicious attack traffic is relatively small compared to the overall traffic data, and the distribution of traffic data across different types of malicious attacks also varies significantly. To address the aforementioned challenges, this paper presents a novel network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing. Building upon the assumption of consistent distribution between labeled and unlabeled data, this paper introduces the multiclass split balancing strategy and the adaptive confidence threshold function. These innovative approaches aim to tackle the issue of the multiclass imbalanced in traffic data. By leveraging the mutually beneficial relationship between semi-supervised learning and ensemble learning, this paper presents the collaborative rotation forest algorithm. This algorithm is specifically designed to enhance performance of anomaly detection in an environment with label inadequacy. Several comparative experiments conducted on the NSL-KDD, UNSW-NB15, and ToN-IoT demonstrate that the proposed algorithm achieves significant improvements in performance. Specifically, it enhances precision by 1.5–5.7%, recall by 1.5−5.7%, and F-Measure by 1.4−4.3% compared to the state-of-the-art algorithms.
computer science, theory & methods,engineering, electrical & electronic, hardware & architecture