Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

Xiaohan Zhang,Yuan Zhang,Ming Zhong,Daizong Ding,Yinzhi Cao,Yukun Zhang,Mi Zhang,Min Yang
DOI: https://doi.org/10.1145/3372297.3417291
2020-10-30
Abstract:Machine learning (ML) classifiers have been widely deployed to detect Android malware, but at the same time the application of ML classifiers also faces an emerging problem. The performance of such classifiers degrades---or called ages---significantly over time given the malware evolution. Prior works have proposed to use retraining or active learning to reverse and improve aged models. However, the underlying classifier itself is still blind, unaware of malware evolution. Unsurprisingly, such evolution-insensitive retraining or active learning comes at a price, i.e., the labeling of tens of thousands of malware samples and the cost of significant human efforts. In this paper, we propose the first framework, called APIGraph, to enhance state-of-the-art malware classifiers with the similarity information among evolved Android malware in terms of semantically-equivalent or similar API usages, thus naturally slowing down classifier aging. Our evaluation shows that because of the slow-down of classifier aging, APIGraph saves significant amounts of human efforts required by active learning in labeling new malware samples.
What problem does this paper attempt to address?