Malware classification based on heterogeneous information network representation learning

Yu Chen,Bin Qin,Changchun Ma,Ming Xu
DOI: https://doi.org/10.1109/ICBAIE49996.2020.00018
2020-01-01
Abstract:In recent years, due to the increasing complexity of malware and serious threats to system security, the anti-malware industry and researchers urgently need new technologies to improve the abilities to prevent malicious attacks. On this basis, this paper proposes a new malware detection method, which describes malware in terms of content-based and relationship-based features. Firstly, different types of entities (i.e., PE file, API, DLL, Registry, Mutex) and their rich semantic relationships (i.e., PE File-API, PE File-DLL, PE File-Registry, PE File-Mutex, DLL-API) are built into a model, which is named heterogeneous information network (HIN). Based on the constructed meta-path scheme, metapath2vec embedding model is used to learn the HIN low-dimensional vectors, which can capture the structure and semantic relationship of HIN. Finally, A Convolutional Neural Network (CNN) is designed to classify the learned HIN representation. The experimental results show that this method achieves 93% accuracy.
What problem does this paper attempt to address?