Android Malware Family Labeling: Perspectives from the Industry
Liu Wang,Haoyu Wang,Tao Zhang,Haitao Xu,Guozhu Meng,Peiming Gao,Chen Wei,Yi Wang
DOI: https://doi.org/10.1145/3691620.3695280
2024-01-01
Abstract:Labeling and classifying Android malware is important for identifying new threats, triaging security incidents, and demystifying evasion techniques. To automate the malware classification pipeline, state-of-the-art tools such as AVClass and Euphony unify raw labels from commercial antivirus vendors (i.e., VirusTotal) to produce family labels. These tools are widely used for automatic malware classification in both academic research and industry practice. However, they face significant limitations in real-world industrial scenarios with numerous and dynamically changing samples. For example, our industrial practices revealed that VirusTotal's results change over time, leading to temporal inconsistencies in family labeling results that rely on label unification, which can severely impact a company's security posture. Despite this, such issues and challenges remain understudied. In this paper, we present the first systematic measurement study of existing automatic Android malware family labeling systems from various aspects, including label dynamics, consistency, reliability, and etc. Based on a large-scale dataset, we validate that the labeling results of these systems do evolve with time, and such evolution can introduce bias into many previous studies on performance assessments. We also reveal substantial divergence in labeling decisions across different systems when given the same input. Besides, we identify a disclosure priority among families in these systems' labeling processes, which could threaten the industry by allowing malicious actors to exploit these discrepancies. Our findings could benefit both researchers and industry practitioners for further refinement of automatic malware family labeling systems, contributing to their practical applications.