SOUL: A Semi-supervised Open-world continUal Learning method for Network Intrusion Detection

Suresh Kumar Amalapuram,Shreya Kumar,Bheemarjuna Reddy Tamma,Sumohana Channappayya
2024-12-02
Abstract:Fully supervised continual learning methods have shown improved attack traffic detection in a closed-world learning setting. However, obtaining fully annotated data is an arduous task in the security domain. Further, our research finds that after training a classifier on two days of network traffic, the performance decay of attack class detection over time (computed using the area under the time on precision-recall AUC of the attack class) drops from 0.985 to 0.506 on testing with three days of new test samples. In this work, we focus on label scarcity and open-world learning (OWL) settings to improve the attack class detection of the continual learning-based network intrusion detection (NID). We formulate OWL for NID as a semi-supervised continual learning-based method, dubbed SOUL, to achieve the classifier performance on par with fully supervised models while using limited annotated data. The proposed method is motivated by our empirical observation that using gradient projection memory (constructed using buffer memory samples) can significantly improve the detection performance of the attack (minority) class when trained using partially labeled data. Further, using the classifier's confidence in conjunction with buffer memory, SOUL generates high-confidence labels whenever it encounters OWL tasks closer to seen tasks, thus acting as a label generator. Interestingly, SOUL efficiently utilizes samples in the buffer memory for sample replay to avoid catastrophic forgetting, construct the projection memory, and assist in generating labels for unseen tasks. The proposed method is evaluated on four standard network intrusion detection datasets, and the performance results are closer to the fully supervised baselines using at most 20% labeled data while reducing the data annotation effort in the range of 11 to 45% for unseen data.
Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to solve two main problems in Network Intrusion Detection Systems (NIDS): 1. **Label scarcity problem**: In the field of network security, obtaining fully labeled data is a daunting task. Due to the complexity and cost of data labeling, existing network intrusion detection methods usually require a large amount of labeled data to train models, which is very difficult in practical applications. The paper proposes a semi - supervised continuous learning (SSCL) method, namely SOUL, aiming to use limited labeled data to improve model performance. 2. **Performance degradation problem under Open - World Learning (OWL)**: Existing continuous learning methods perform well when dealing with known attacks, but their performance drops significantly when identifying newly emerging unknown attacks (zero - day attacks). The paper shows through experiments that when using the CICIDS - 2017 and CSE - CICIDS - 2018 datasets for testing, the attack - class detection performance of the model on unseen traffic drops significantly over time. Specifically, it drops from an initial 0.985 to 0.506 (for the CICIDS - 2017 dataset), and from 0.999 to 0.157 (for the CSE - CICIDS - 2018 dataset). To solve the above problems, the paper proposes the SOUL method, and its main contributions are as follows: 1. **A semi - supervised continuous learning method for open - world learning problems**: The SOUL method can adapt to unseen network traffic in the case of label scarcity, thereby improving the performance of attack - class detection. 2. **High - confidence label generation**: SOUL combines buffer memory and model confidence to generate high - confidence labels when encountering unseen tasks similar to seen tasks, thereby reducing the dependence on manually labeled data. 3. **Performance evaluation**: The paper evaluates the performance of the SOUL method on four standard network intrusion detection datasets. The results show that using at most 20% of the labeled data, the performance of the SOUL method is close to that of the fully - supervised baseline model, and the amount of unseen data labeling work is reduced by 11% to 45%. Through these contributions, the SOUL method not only improves the performance of network intrusion detection systems but also significantly reduces the cost and complexity of data labeling.