A Simhash-Based Integrative Features Extraction Algorithm for Malware Detection

Yihong Li,Fangzheng Liu,Zhenyu Du,Dubing Zhang
DOI: https://doi.org/10.3390/a11080124
2018-01-01
Algorithms
Abstract:In the malware detection process, obfuscated malicious codes cannot be efficiently and accurately detected solely in the dynamic or static feature space. Aiming at this problem, an integrative feature extraction algorithm based on simhash was proposed, which combines the static information e.g., API (Application Programming Interface) calls and dynamic information (such as file, registry and network behaviors) of malicious samples to form integrative features. The experiment extracts the integrative features of some static information and dynamic information, and then compares the classification, time and obfuscated-detection performance of the static, dynamic and integrated features, respectively, by using several common machine learning algorithms. The results show that the integrative features have better time performance than the static features, and better classification performance than the dynamic features, and almost the same obfuscated-detection performance as the dynamic features. This algorithm can provide some support for feature extraction of malware detection.
What problem does this paper attempt to address?