APIBeh: Learning Behavior Inclination of APIs for Malware Classification
Lei Cui,Yiran Zhu,Junnan Yin,Zhiyu Hao,Wei Wang,Peng Liu,Ziqi Yang,Xiaochun Yun
DOI: https://doi.org/10.1109/issre62328.2024.00012
2024-01-01
Abstract:Malware classification involves categorizing mal-ware samples based on their characteristics. While deep learning techniques applied to malware execution traces, mainly API calls, have shown potential in this field, they still perform poorly. This is primarily because they treat all APIs equally and train classifiers directly on native APIs, which inadequately capture the under-lying family-related semantics. In this paper, we first investigate the behaviors of multiple malware families and observe that different families exhibit divergent behaviors, with each family consistently favoring certain behaviors over time. Motivated by this, we propose APIBeh, a new embedding method designed to enhance malware classification. APIBeh first utilizes Benignity Degree Algorithm to identify and exclude insignificant, likely benign APIs from sequences. Then, it introduces the concept of Behavior Inclination, which quantifies the association between an API and malicious behaviors, facilitating high-level behavior encoding for each API. This Behavior Inclination embedding is then concatenated with raw embedding to represent an API, and fed into a DL model for classifier training. Experimental results show that APIBeh outperforms existing embedding methods in classification performance, e.g., 3.18% boost in weighted f1-score over a recent study using word2vec. In addition, it offers robustness to concept drift and adversarial attacks.