GMADV: an Android Malware Variant Generation and Classification Adversarial Training Framework

Shuangcheng Li,Zhangguo Tang,Huanzhou Li,Jian Zhang,Han Wang,Junfeng Wang
DOI: https://doi.org/10.1016/j.jisa.2024.103800
IF: 4.96
2024-01-01
Journal of Information Security and Applications
Abstract:Android malware uses anti-reverse analysis and APK shelling technology, which leads to the failure of the classification method based on decompiled features and the reduction of the classification accuracy based on single file features. Moreover, the lack of samples in some families of Android malware makes the classification model based on sample learning ineffective. To solve the above problems, this paper proposes a two-layer general framework for Android malware classification and adversarial training named GMADV, which enhances classifier performance through adversarial training. In the sample classification layer, based on the transformation method of the Markov model, it is proposed for the first time to convert the three files in the APK into RGB Markov images, and use VGG13 to automatically extract features and classification; In the variant amplification layer, the idea of "regression for generation" is firstly proposed, and GMM-GAN based on Gaussian process is designed to amplify the diversity of samples within the family. The experimental results show that RGB Markov images have better classification performance than grayscale images. On the three datasets, the classification effect after amplification has been improved to varying degrees, and all F1_Score reaches 95 %. Compared with other methods, GMADV has stronger family sample amplification ability and greater adversarial intensity.
What problem does this paper attempt to address?