Generative AI-Based Effective Malware Detection for Embedded Computing Systems

Sreenitha Kasarapu,Sanket Shukla,Rakibul Hassan,Avesta Sasan,Houman Homayoun,Sai Manoj Pudukotai Dinakarrao
2024-04-13
Abstract:One of the pivotal security threats for the embedded computing systems is malicious software a.k.a malware. With efficiency and efficacy, Machine Learning (ML) has been widely adopted for malware detection in recent times. Despite being efficient, the existing techniques require a tremendous number of benign and malware samples for training and modeling an efficient malware detector. Furthermore, such constraints limit the detection of emerging malware samples due to the lack of sufficient malware samples required for efficient training. To address such concerns, we introduce a code-aware data generation technique that generates multiple mutated samples of the limitedly seen malware by the devices. Loss minimization ensures that the generated samples closely mimic the limitedly seen malware and mitigate the impractical samples. Such developed malware is further incorporated into the training set to formulate the model that can efficiently detect the emerging malware despite having limited exposure. The experimental results demonstrates that the proposed technique achieves an accuracy of 90% in detecting limitedly seen malware, which is approximately 3x more than the accuracy attained by state-of-the-art techniques.
Cryptography and Security,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to effectively detect malware in embedded computing systems, especially when the training samples are limited. Specifically, existing techniques require a large number of benign and malware samples for training, but in practical applications, it is very difficult to obtain enough newly - emerging malware samples, resulting in the limited detection ability of these methods for emerging malware. In addition, malware developers make malware difficult to be detected by traditional static and dynamic analysis methods through means such as code obfuscation, metamorphism and polymorphism. To solve these problems, the paper proposes a code - aware data generation technique based on Generative Adversarial Networks (GANs), which can generate mutated malware samples, thereby alleviating the problem of insufficient samples and improving the detection ability of newly - emerging complex malware. The following are the main contributions of the paper: 1. **Introduced a code - aware generative AI architecture** for increasing the training data set. 2. **Adopted a loss - minimization technique** to ensure that the generated data can capture the code patterns and their functions of the complex malware observed with limitations. 3. **Used few - shot learning** to efficiently classify complex stealthy malware and code - obfuscated malware. The experimental results show that the proposed technique can achieve an accuracy rate of about 90% when using only limited samples, which is about 9% higher than the classifier trained with only limited samples. ### Formula Representation - The data set \(D\) contains four types of samples: benign samples \(B\), traditional malware \(M\), randomly obfuscated malware \(O_m\), and stealthy malware \(S_m\): \[ D=\{B + M+O_m + S_m\} \] - The limited - data - version data set \(D_x^l\) is randomly drawn from the entire data set \(D_n\) and contains no more than \(\nabla\%\) of the original number of samples \(n\): \[ D_x^l\subset D_n; \forall x\leq\nabla\%n \] - The classifier \(C\) needs to be able to distinguish between benign samples and various types of malware in the limited samples: \[ C:(D_l)\Rightarrow(B, M, O_m, S_m) \] In this way, the paper aims to improve the detection performance of complex malware under the condition of limited samples.