Word Embedding-based Context-sensitive Network Flow Payload Anomaly Detection

Yizhou Li,Yijie Wang,Li Cheng,Hongzuo Xu
DOI: https://doi.org/10.1109/icaml54311.2021.00048
2021-01-01
Abstract:Payload anomaly detection can discover malicious behaviors hidden in network packets. It is hard to handle payload due to its various possible characters and complex semantic context, and thus identifying abnormal payload is also a non-trivial task. Prior art only uses the n-gram language model to extract features, which directly leads to ultra-high-dimensional feature space and also fails to capture the context semantics fully. Accordingly, this paper proposes a word embedding-based context-sensitive network flow payload anomaly detection method (termed WECAD). First, WECAD obtains the initial feature representation of the payload through the word embedding-based method. Then, we propose a corpus pruning algorithm, which applies the cosine similarity clustering and frequency distribution to prune inconsequential characters. We only keep the essential characters to reduce the calculation space. Subsequently, we propose a context learning algorithm. It employs the co-occurrence matrix transformation technology and introduces the backward step size to consider the order relationship of essential characters. Comprehensive experiments on real-world intrusion detection datasets validate the effectiveness of our method.
What problem does this paper attempt to address?