Identifying Parasitic Malware As Outliers by Code Clustering
Hongcheng Li,Jianjun Huang,Bin Liang,Wenchang Shi,Yifang Wu,Shilei Bai
DOI: https://doi.org/10.3233/jcs-191313
2020-01-01
Journal of Computer Security
Abstract:Injecting malicious code into benign programs is popular in spreading malware. Unfortunately, for detection, the prior knowledge about the malware, e.g., the behavior or implementation patterns, isn’t always available. Our observation shows that the logic of the host program is normally unclear to parasitic malware developers, resulting in very few interactions between the host and the payloads in lots of parasitic malware. Thus we can expose the injected part by grouping the code based on the interactive relations. Particularly, we partition a target program into modules, extract the relations, cluster the modules and further inspect the outliers to identify such malware. In this paper, we design a two-stage code clustering-based approach to detecting two representative types of malware, the UEFI rootkits and the piggybacked Android applications. Parasitic malware is reported when (1) any outlier in a UEFI firmware shows a relatively long distance to the largest cluster, or (2) the largest outlier distance exceeds zero in an Android application, i.e., multiple cluster exist after re-clustering outliers. We evaluate the approach on 35 pairs of benign/infected UEFI samples we do our best to get and achieve an overall F1 score. of 100%. Applying the learned threshold to 50 other benign firmwares, we identify them without false positives. In addition, our evaluation on 1079 pairs of Android applications, shows an F1 score of 90.66% when the third-party libraries are eliminated and a score of 87.36% if we keep the popular third-party libraries, demonstrating the effectiveness of the approach.