Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection

Ira Ceka,Feitong Qiao,Anik Dey,Aastha Valechia,Gail Kaiser,Baishakhi Ray
2024-12-17
Abstract:Despite their remarkable success, large language models (LLMs) have shown limited ability on applied tasks such as vulnerability detection. We investigate various prompting strategies for vulnerability detection and, as part of this exploration, propose a prompting strategy that integrates natural language descriptions of vulnerabilities with a contrastive chain-of-thought reasoning approach, augmented using contrastive samples from a synthetic dataset. Our study highlights the potential of LLMs to detect vulnerabilities by integrating natural language descriptions, contrastive reasoning, and synthetic examples into a comprehensive prompting framework. Our results show that this approach can enhance LLM understanding of vulnerabilities. On a high-quality vulnerability detection dataset such as SVEN, our prompting strategies can improve accuracies, F1-scores, and pairwise accuracies by 23%, 11%, and 14%, respectively.
Cryptography and Security,Artificial Intelligence,Computation and Language,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to use large language models (LLMs) to replace static analysis through prompting strategies to improve the accuracy of software vulnerability detection. Specifically, the authors explored different prompting strategies, especially integrating natural - language descriptions, contrastive chain - of - thought reasoning, and synthetic samples into the prompting framework, thereby enhancing the LLMs' understanding and identification ability of vulnerabilities. ### Main problems 1. **Limitations of existing methods**: - Although LLMs have made significant progress in natural - language processing tasks, their performance in practical application tasks such as vulnerability detection is limited. - Traditional methods rely on static and dynamic analysis techniques, and while deep - learning - based methods have improved, they still have problems such as data leakage, inaccurate labels, and data duplication. - Even the most advanced LLMs such as GPT - 4 show limited performance in zero - sample settings. 2. **The need to improve vulnerability - detection performance**: - Researchers need a new method to enhance the performance of LLMs in vulnerability detection, especially on high - quality, real - world vulnerability - detection datasets. - A comprehensive prompting framework that combines natural - language descriptions, contrastive chain - of - thought reasoning, and synthetic samples is proposed to improve the LLMs' understanding and detection ability of vulnerabilities. ### Solutions The authors proposed several prompting strategies and carried out experimental verification: 1. **Vanilla Prompt**: - Only ask whether the code is vulnerable or not vulnerable, without any natural - language descriptions or reasoning chains about the vulnerability type. 2. **Natural Language Instructions**: - Use CWE - specific detection instructions generated by LLM, including instructions generated from descriptions obtained from the MITRE CWE dictionary and a small number of examples. 3. **Natural Language Description + Contrastive Chain - of - Thought**: - Combine natural - language descriptions and contrastive chain - of - thought reasoning to enhance the LLMs' vulnerability - detection ability through synthetic samples and chain - reasoning. ### Experimental results Through evaluation on multiple datasets, the authors found that: - These prompting strategies have significant improvements in accuracy, F1 - score, and pairwise accuracy. - Especially on the SVEN dataset, some prompting strategies can increase pairwise accuracy by up to 23%, F1 - score by 11%, and accuracy by 14%. ### Summary This research shows that through carefully designed prompting strategies, LLMs can achieve better performance in vulnerability - detection tasks, especially on high - quality, real - world datasets. This provides a new direction for future research, that is, how to further optimize prompting strategies to improve the application effect of LLMs in the security field.