Deep Multi-Task Learning for Malware Image Classification

Ahmed Bensaoud,Jugal Kalita
DOI: https://doi.org/10.1016/j.jisa.2021.103057
2024-05-10
Abstract:Malicious software is a pernicious global problem. A novel multi-task learning framework is proposed in this paper for malware image classification for accurate and fast malware detection. We generate bitmap (BMP) and (PNG) images from malware features, which we feed to a deep learning classifier. Our state-of-the-art multi-task learning approach has been tested on a new dataset, for which we have collected approximately 100,000 benign and malicious PE, APK, Mach-o, and ELF examples. Experiments with seven tasks tested with 4 activation functions, ReLU, LeakyReLU, PReLU, and ELU separately demonstrate that PReLU gives the highest accuracy of more than 99.87% on all tasks. Our model can effectively detect a variety of obfuscation methods like packing, encryption, and instruction overlapping, strengthing the beneficial claims of our model, in addition to achieving the state-of-art methods in terms of accuracy.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
This paper proposes a deep multi-task learning framework for malicious software image classification, aiming to achieve fast and accurate malware detection. The researchers generate bitmaps (BMP) and PNG images from malicious software features and input them into a deep learning classifier. They test this state-of-the-art multi-task learning approach on a new dataset containing approximately 100,000 benign and malicious PE, APK, Mach-o, and ELF samples. The results show that when using the PReLU activation function, the accuracy of all tasks exceeds 99.87%. The paper mentions that with the increase of computer and network attacks, malicious software has become a global problem. Although current detection techniques have achieved certain results in computer vision, malware classification remains challenging. Therefore, the authors propose a new multi-task learning architecture that can effectively detect various obfuscation methods such as packing, encryption, and instruction overlapping, while also achieving the best accuracy in this field. The research work includes creating a large modern color image dataset that includes malware from different operating system files and making it publicly available for the research community to use. The experiments show that the proposed framework achieves deep learning in a multi-task learning architecture with an average accuracy rate of 99.97%, and contributes to both static and dynamic analysis of malware detection. In addition, the paper discusses existing malware detection methods such as visualization and classification based on deep learning, as well as related work on multi-task learning. By converting malware files into images and utilizing multi-task learning, this model is able to perform binary and multi-class malware classification simultaneously, improving detection efficiency and accuracy.