MalSSL—Self-Supervised Learning for Accurate and Label-Efficient Malware Classification

Setia Juli Irzal Ismail,Hendrawan,Budi Rahardjo,Tutun Juhana,Yasuo Musashi
DOI: https://doi.org/10.1109/access.2024.3392251
IF: 3.9
2024-05-03
IEEE Access
Abstract:Malware classification with supervised learning requires a large dataset, which needs an expensive and time-consuming labeling process. In this paper, we explore the efficacy of self-supervised learning techniques for malware classification. We propose MalSSL, a self-supervised learning-based method utilizing image representation to classify malware. MalSSL classifies unlabeled malware images using contrastive learning and data augmentation. The model is initially trained on an unlabeled Imagenette dataset as a pretext task and subsequently retrained on an unlabeled malware dataset in downstream tasks. Two downstream tasks were employed to evaluate the system: 1) malware family classification and 2) malware benign classification. The obtained results include an accuracy of 98.4% for the malware family classification experiment on the Malimg dataset and an accuracy of 96.2% for the malware and benign dataset (Maldeb dataset). Our findings suggest that the proposed system accurately classifies malware without the need for labeled data, displaying higher accuracy compared to other self-supervised methods. This research not only contributes to advancing the state-of-the-art in malware classification but also underscores the potential of self-supervised learning methods as a viable solution for addressing the dynamic landscape of malware threats.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?