Malware Detection Using CNN Via Word Embedding

Jie Zhang,Lin Yan,Rong Wang,Cong Tian,Zhenhua Duan
DOI: https://doi.org/10.1109/dsa52907.2021.00087
2021-01-01
Abstract:Malware has long been an enormous threat to the security of computer and network, the main defensive strategy at present is feature extraction based malware detection. In the existing work, most of the feature extraction techniques rely on byte N-gram patterns or binary strings for representing log files, or other static features. This paper proposes a new feature extraction method and adopt word embedding (GloVe) to express the information extracted from program files. As a result, the relevant Vector Space Model (VSM) will incorporate with more information about unknown programs. We utilize Convolutional Neural Network (CNN) to analyze the feature maps represented by word embedding, and apply Softmax to fit the probability of a malicious program. Eventually, a program is malicious if the probability is greater than 0.5; otherwise it is a benign program. Experimental results show that our approach achieves a level of accuracy higher than 98%.
What problem does this paper attempt to address?