How Far Have We Gone in Vulnerability Detection Using Large Language Models

Zeyu Gao,Hao Wang,Yuchen Zhou,Wenyu Zhu,Chao Zhang
2023-12-22
Abstract:As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.
Artificial Intelligence,Computation and Language,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the effectiveness and potential of large - language models (LLMs) in software vulnerability detection. Specifically, the paper aims to: 1. **Quantitatively evaluate the performance of LLMs**: By introducing a comprehensive vulnerability benchmark dataset, VulBench, the paper conducts large - scale experiments on 16 LLMs and 6 state - of - the - art deep - learning models and static analysis tools to quantitatively evaluate their performance in vulnerability detection. 2. **Fill the gaps in existing research**: Although LLMs have shown strong capabilities in multiple fields, their quantitative evaluation in the field of vulnerability detection is still insufficient. Through detailed experimental results, the paper fills this research gap. 3. **Improve the quality and accuracy of datasets**: Existing vulnerability datasets are often of low quality and poor accuracy, resulting in low detection accuracy. To this end, the paper constructs a high - quality VulBench dataset, which covers vulnerabilities in CTF challenges and real - world applications and provides detailed vulnerability types and root - cause annotations. 4. **Explore the untapped potential of LLMs in vulnerability detection**: Through experiments, the paper finds that some LLMs are superior to traditional deep - learning methods in vulnerability detection, revealing the untapped potential of LLMs in this field. ### Main contributions of the paper - **First large - scale study**: Quantitatively measures the performance of 16 LLMs in the field of vulnerability detection and compares them with state - of - the - art deep - learning models and static analysis tools. - **Introduction of the VulBench dataset**: Solves the quality problems of existing datasets and provides a more accurate and comprehensive dataset for evaluating vulnerability detection models and provides a natural - language description for each vulnerability. - **Reveal the untapped potential of LLMs**: The research results provide new insights and directions for future research, demonstrating the potential advantages of LLMs in vulnerability detection. - **Publish the dataset publicly**: To promote future research, the paper releases the VulBench dataset on GitHub. ### Summary By constructing a high - quality dataset and conducting large - scale experiments, the paper systematically evaluates the performance of LLMs in vulnerability detection, reveals their untapped potential, and lays the foundation for future related research. This not only enhances our understanding of the application of LLMs in the field of software security but also paves new ways for the development of automated vulnerability detection technology.