UltraVCS: Ultra-fine-grained Variable-based Code Slicing for Automated Vulnerability Detection

Tongshuai Wu,Liwei Chen,Gewangzi Du,Dan Meng,Gang Shi
DOI: https://doi.org/10.1109/tifs.2024.3374219
IF: 7.231
2024-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Detecting vulnerabilities in source code using deep learning models is emerging as a valuable research area. The key issue in using deep learning to detect vulnerabilities is the accurate representation. Current approaches for detecting vulnerabilities in C/C++ programs use functions or lines of code as the unit and only consider the basic syntactic structure of vulnerabilities. Unfortunately, functions and lines of code still have vulnerability-unrelated information, which is redundant for vulnerability features and is not conducive to deep learning models to learn accurate vulnerability patterns. This paper deeply analyzes the essential features of vulnerabilities and attacks. Then, we propose a novel variable-based deep learning vulnerability detection method for C/C++ that is more granular than existing function- or line of code-based vulnerability detection methods. Based on the triggering mechanism of vulnerabilities and typical memory attacks, we propose the concepts of key variables and insecure operations; these are used to propose new rules for determining the center point of code slices with more accurate vulnerability features. We propose the first ultra-fine-grained variable-based code slicing (UltraVCS) method by the new center point, which focuses on the vulnerability-related variable. This method removes as much vulnerability-unrelated information as possible to achieve more accurate vulnerability feature extraction. Experiments show that our approach can generate more code slices, achieve more precise vulnerability representation, and perform better vulnerability detection in open-source projects compared to state-of-the-art methods. Furthermore, we have discovered four zero-day vulnerabilities in real-world application scenarios in open-source projects.
computer science, theory & methods,engineering, electrical & electronic
What problem does this paper attempt to address?