Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls

Pedro Miguel Sánchez Sánchez,Alberto Huertas Celdrán,Gérôme Bovet,Gregorio Martínez Pérez
2024-05-15
Abstract:In the current cybersecurity landscape, protecting military devices such as communication and battlefield management systems against sophisticated cyber attacks is crucial. Malware exploits vulnerabilities through stealth methods, often evading traditional detection mechanisms such as software signatures. The application of ML/DL in vulnerability detection has been extensively explored in the literature. However, current ML/DL vulnerability detection methods struggle with understanding the context and intent behind complex attacks. Integrating large language models (LLMs) with system call analysis offers a promising approach to enhance malware detection. This work presents a novel framework leveraging LLMs to classify malware based on system call data. The framework uses transfer learning to adapt pre-trained LLMs for malware detection. By retraining LLMs on a dataset of benign and malicious system calls, the models are refined to detect signs of malware activity. Experiments with a dataset of over 1TB of system calls demonstrate that models with larger context sizes, such as BigBird and Longformer, achieve superior accuracy and F1-Score of approximately 0.86. The results highlight the importance of context size in improving detection rates and underscore the trade-offs between computational complexity and performance. This approach shows significant potential for real-time detection in high-stakes environments, offering a robust solution to evolving cyber threats.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that current malware detection methods are insufficient in understanding and identifying the context and intention behind complex attacks. Traditional detection mechanisms based on software signatures are often unable to effectively deal with malware that exploits vulnerabilities through covert methods, especially in military devices such as communication and battlefield management systems, which are faced with complex cyber - attack threats. To improve the effectiveness of malware detection, the author proposes a new framework that utilizes large - language models (LLMs) and system - call data for malware classification. Specifically, this research addresses the following key issues: 1. **Understanding the context and intention of complex attacks**: Traditional machine - learning and deep - learning methods have difficulty capturing the context and intention when dealing with complex attacks. By combining large - language models and system - call analysis, the background information of malicious behavior can be better understood. 2. **Adapting to non - natural - language data**: Large - language models were originally designed for natural - language processing, while system - call data does not conform to the structure of natural language. Therefore, a method needs to be developed to convert system - call data into a form suitable for LLMs to process. 3. **Achieving real - time detection capabilities**: The requirement for real - time performance in military environments is extremely high, and LLMs usually have large computational resource requirements. Therefore, how to achieve rapid detection while ensuring performance is a challenge. 4. **Balancing context length and detection accuracy**: A longer context helps to improve detection accuracy, but it also increases computational complexity and latency. This trade - off relationship is explored in the study, and the optimal context length is found. 5. **Handling diverse attack vectors**: Malware is constantly evolving, and the model needs to be able to handle various different attack patterns without frequent manual updates. By solving these problems, this research aims to provide a robust and efficient malware detection solution, especially suitable for real - time detection requirements in high - risk environments. Experimental results show that using models with a larger context length (such as BigBird and Longformer) can significantly improve detection accuracy and F1 - score, reaching a level of approximately 0.86. This provides an important reference and direction for future cybersecurity research.