Traffic Anomaly Detection Model Using K-Means and Active Learning Method
Niandong Liao,Xiaoxuan Li,Li, Xiaoxuan
DOI: https://doi.org/10.1007/s40815-022-01269-0
IF: 4.085
2022-03-23
International Journal of Fuzzy Systems
Abstract:As the digital world becomes the main complement to the physical world, establishing a solid line of defense against cyber attacks becomes critical and arduous. The intrusion detection systems (IDSs) based on the supervised learning method have achieved excellent performance, which requires a large amount of labeled data in the training phase. However, attacks occur much less frequently than normal behaviors, and it is difficult to obtain accurate labels. In addition, IDSs based on supervised learning cannot identify unknown attacks. At the same time, the problem that detection accuracy varies greatly with different applications is very significant in traditional unsupervised learning methods. Therefore, it is necessary to perform high-precision anomaly detection on unlabeled samples. This paper proposes a traffic anomaly detection model using K-means and Active Learning Method (ALM), which is mainly composed of a feature extraction module and an anomaly detection module. Firstly, the Pearson correlation coefficient and Light Gradient Boosting Machine (LightGBM) are used in the feature extraction module to select important features. Secondly, K-means divides the characteristic-processed traffic into normal or abnormal categories. Finally, the results of K-means are diffused through ALM, and new classification results are obtained after defuzzification, thereby improving the accuracy of anomaly detection. The latest CICDDoS2019 data set is used in the experiment. Experimental results show that the detection accuracy of the proposed model is above 90%, and the F1 score is above 95%, regardless of whether it is a binary classification of a single attack or a mixed classification of multiple attacks. Compared with three unsupervised learning methods K-means, Auto-encoder and short-term memory (LSTM) and three supervised learning methods Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), the proposed model has higher classification accuracy and better generalization effect. This article is very helpful for exploring the application of unsupervised learning methods in network intrusion detection systems based on the characteristics of the data itself.
computer science, information systems,automation & control systems, artificial intelligence