A Novel Approach to Malicious Code Detection Using CNN-BiLSTM and Feature Fusion

Lixia Zhang,Tianxu Liu,Kaihui Shen,Cheng Chen
2024-10-12
Abstract:With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detection methods struggle against variants and packed malware, while dynamic methods face high costs and risks that limit their application. Consequently, there is an urgent need for novel and efficient malware detection techniques to improve accuracy and robustness. This study first employs the minhash algorithm to convert binary files of malware into grayscale images, followed by the extraction of global and local texture features using GIST and LBP algorithms. Additionally, the study utilizes IDA Pro to decompile and extract opcode sequences, applying N-gram and tf-idf algorithms for feature vectorization. The fusion of these features enables the model to comprehensively capture the behavioral characteristics of malware. In terms of model construction, a CNN-BiLSTM fusion model is designed to simultaneously process image features and opcode sequences, enhancing classification performance. Experimental validation on multiple public datasets demonstrates that the proposed method significantly outperforms traditional detection techniques in terms of accuracy, recall, and F1 score, particularly in detecting variants and obfuscated malware with greater stability. The research presented in this paper offers new insights into the development of malware detection technologies, validating the effectiveness of feature and model fusion, and holds promising application prospects.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies of current malware detection techniques in the face of increasingly complex and diverse malware. Specifically: 1. **Limitations of traditional detection methods**: - **Static detection methods**: Although they are fast and do not require code execution, they are difficult to deal with malware that has been obfuscated, packaged or mutated. - **Dynamic detection methods**: Although they can effectively handle obfuscated code and identify malicious behaviors, they have high computational costs, large time requirements, and are easily detected and evaded by malware. 2. **Requirements for new detection methods**: - With the rapid development of Internet technology, malware poses an increasingly serious threat to computer systems and network security. It not only affects personal privacy and security, but also poses risks to the critical infrastructure of enterprises and countries. - The number and complexity of malware are constantly increasing, and its concealment and diversity have brought great challenges to traditional detection techniques. 3. **Research objectives**: - Propose a static malware detection method based on deep learning. By fusing image texture features and opcode sequence features, improve the accuracy and robustness of detection. - Design and implement a fusion model combining Convolutional Neural Network (CNN) and Bidirectional Long - Short - Term Memory Network (BiLSTM) to process image features and opcode sequences simultaneously and enhance classification performance. 4. **Main contributions**: - **Feature level**: Propose a method of fusing image texture features and opcode features, overcome the limitations of single - feature extraction, and achieve multi - dimensional capture of the characteristics of static malware. - **Model level**: Design and implement a model fusing CNN and BiLSTM, use the advantages of both to process image and sequence data, and significantly improve the accuracy and efficiency of malware detection. - **Experimental level**: Through experimental verification on multiple public data sets, show the superior performance of the proposed method in detecting various types of malware (especially variant and obfuscated samples), with higher stability and robustness. In summary, this paper aims to solve the deficiencies of existing malware detection techniques in the face of complex and diverse malware and improve the accuracy and robustness of detection by proposing a new deep - learning method.