Self-Supervised Learning Malware Traffic Classification Based On Masked Auto-Encoder
Ke Xu,Xixi Zhang,Yu Wang,Tomoaki Ohtsuki,Bamidele Adebisi,Hikmet Sari,Guan Gui
DOI: https://doi.org/10.1109/jiot.2024.3357072
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Malware traffic classification (MTC) is one of the important techniques to ensure the security of cyberspace, which aims to detect anomalies and classify different types of network traffic. Recently, MTC methods based on deep learning (DL) have shown their excellent performance. However, these DL-based methods rely on datasets with manually labeled samples for training, which are costly and hard to obtain. To address this problem, this paper proposes a novel self-supervised MTC method based on the framework of masked auto-encoder (MAE). Specifically, MAE first constructs a reasonable unsupervised pretext task with a random masking strategy, which reduces the redundant information in samples and speeds up the pre-training process. The transformer-based backbone network then efficiently extracts features from the non-redundant traffic data efficiently. The proposed MTC-MAE method employs self-supervised learning on a large-scale unlabeled dataset to acquire unbiased features, and fine-tunes on specific datasets to adapt to diverse traffic classification scenarios. Simulation experiments show that our proposed MTC-MAE method is able to learn universal features with high quality and has excellent classification performance on various downstream datasets. The datasets we used, code implementation, and pre-trained models are available on GitHub.
computer science, information systems,telecommunications,engineering, electrical & electronic