Flow-MAE: Leveraging Masked AutoEncoder for Accurate, Efficient and Robust Malicious Traffic Classification

Zijun Hang,Yi Xie,Yuliang Lu,Yongjie Wang
DOI: https://doi.org/10.1145/3607199.3607206
2023-10-16
Abstract:Malicious traffic classification is crucial for Intrusion Detection Systems (IDS). However, traditional Machine Learning approaches necessitate expert knowledge and a significant amount of well-labeled data. Although recent studies have employed pre-training models from the Natural Language Processing domain, such as ET-BERT, for traffic classification, their effectiveness is impeded by limited input length and fixed Byte Pair Encoding. To address these challenges, this paper presents Flow-MAE, a pre-training model that employs Masked AutoEncoders (MAE) from the Computer Vision domain to achieve accurate, efficient, and robust malicious network traffic classification. Flow-MAE overcomes these challenges by utilizing burst (a generic representation of network traffic) and patch embedding to accommodate extensive traffic length. Moreover, Flow-MAE introduces a self-supervised pre-training task, the Masked Patch Model, which captures unbiased representations from bursts with varying lengths and patterns. Experimental results from six datasets reveal that Flow-MAE achieves new state-of-the-art accuracy (>0.99), efficiency (>900 samples/s), and robustness across diverse network traffic types. In comparison to the state-of-the-art ET-BERT, Flow-MAE exhibits improvements in accuracy and speed by 0.41%-1.93% and 7.8x-10.3x, respectively, while necessitating only 0.2% FLOPs and 44% memory overhead. The efficacy of the core designs is validated through few-shot learning and ablation experiments. The code is publicly available at https://github.com/NLear/Flow-MAE.
Computer Science
What problem does this paper attempt to address?