Abstract:Despite their remarkable success, large language models (LLMs) have shown limited ability on applied tasks such as vulnerability detection. We investigate various prompting strategies for vulnerability detection and, as part of this exploration, propose a prompting strategy that integrates natural language descriptions of vulnerabilities with a contrastive chain-of-thought reasoning approach, augmented using contrastive samples from a synthetic dataset. Our study highlights the potential of LLMs to detect vulnerabilities by integrating natural language descriptions, contrastive reasoning, and synthetic examples into a comprehensive prompting framework. Our results show that this approach can enhance LLM understanding of vulnerabilities. On a high-quality vulnerability detection dataset such as SVEN, our prompting strategies can improve accuracies, F1-scores, and pairwise accuracies by 23%, 11%, and 14%, respectively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to use large language models (LLMs) to replace static analysis through prompting strategies to improve the accuracy of software vulnerability detection. Specifically, the authors explored different prompting strategies, especially integrating natural - language descriptions, contrastive chain - of - thought reasoning, and synthetic samples into the prompting framework, thereby enhancing the LLMs' understanding and identification ability of vulnerabilities. ### Main problems 1. **Limitations of existing methods**: - Although LLMs have made significant progress in natural - language processing tasks, their performance in practical application tasks such as vulnerability detection is limited. - Traditional methods rely on static and dynamic analysis techniques, and while deep - learning - based methods have improved, they still have problems such as data leakage, inaccurate labels, and data duplication. - Even the most advanced LLMs such as GPT - 4 show limited performance in zero - sample settings. 2. **The need to improve vulnerability - detection performance**: - Researchers need a new method to enhance the performance of LLMs in vulnerability detection, especially on high - quality, real - world vulnerability - detection datasets. - A comprehensive prompting framework that combines natural - language descriptions, contrastive chain - of - thought reasoning, and synthetic samples is proposed to improve the LLMs' understanding and detection ability of vulnerabilities. ### Solutions The authors proposed several prompting strategies and carried out experimental verification: 1. **Vanilla Prompt**: - Only ask whether the code is vulnerable or not vulnerable, without any natural - language descriptions or reasoning chains about the vulnerability type. 2. **Natural Language Instructions**: - Use CWE - specific detection instructions generated by LLM, including instructions generated from descriptions obtained from the MITRE CWE dictionary and a small number of examples. 3. **Natural Language Description + Contrastive Chain - of - Thought**: - Combine natural - language descriptions and contrastive chain - of - thought reasoning to enhance the LLMs' vulnerability - detection ability through synthetic samples and chain - reasoning. ### Experimental results Through evaluation on multiple datasets, the authors found that: - These prompting strategies have significant improvements in accuracy, F1 - score, and pairwise accuracy. - Especially on the SVEN dataset, some prompting strategies can increase pairwise accuracy by up to 23%, F1 - score by 11%, and accuracy by 14%. ### Summary This research shows that through carefully designed prompting strategies, LLMs can achieve better performance in vulnerability - detection tasks, especially on high - quality, real - world datasets. This provides a new direction for future research, that is, how to further optimize prompting strategies to improve the application effect of LLMs in the security field.

Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities

DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection

Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT

Prompting Techniques for Secure Code Generation: A Systematic Investigation

Software Vulnerability and Functionality Assessment using LLMs

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

SoK: Prompt Hacking of Large Language Models

PromptAid: Prompt Exploration, Perturbation, Testing and Iteration using Visual Analytics for Large Language Models

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

A Preliminary Study on Using Large Language Models in Software Pentesting

Harnessing the Power of LLMs in Source Code Vulnerability Detection

Automated Software Vulnerability Patching using Large Language Models

PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models

Mitigating Exaggerated Safety in Large Language Models

Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants