IoT Malware Classification Based on Reinterpreted Function-Call Graphs

Chia-Yi Wu,Tao Ban,Shin-Ming Cheng,Takeshi Takahashi,Daisuke Inoue
DOI: https://doi.org/10.1016/j.cose.2022.103060
2022-12-14
Abstract:Various malware and cyberattacks have arisen along with the proliferation of IoT devices. The evolving malware targeting IoT devices calls forth effective and efficient solutions to protect vulnerable IoT devices from being compromised. In this paper, we investigate the feasibility of a state-of-the-art graph embedding method, graph2vec , for performing family classification for IoT malware, with promising results reported. To further improve the generalization performance of the classifiers based on graph2vec -extracted features, we propose two new mechanisms to improve the quality of feature representation. First, we unify user-defined function calls by reinterpreting the opcode sequences therein to better capture the semantics of the function-call relationship in malware binaries. Then, we integrate literal information into the graph2vec embedding of the function call graph to achieve better discriminant ability. To prove the effectiveness of the proposed scheme, we carried out performance comparison on a large-scale dataset containing more than 108K malware binaries collected from seven CPU architectures. The accuracy rates obtained by five widely adopted classifiers on malware family classification are improved by 2%, on average, by adopting the two proposed mechanisms. Specifically, when combined with the proposed approach, the support vector machine classifier obtained an accuracy rate of 98.88% on malware family classification, outperforming known function-call-graph (FCG)-based methods and previous work on static malware analysis.
computer science, information systems
What problem does this paper attempt to address?