Abstract:Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the black-box nature of DL, and the domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) have made lots of progress in addressing these issues by leveraging prompting techniques. Unfortunately, their performance in identifying vulnerabilities is unsatisfactory. This paper contributes \textbf{\DLAP}, a \underline{\textbf{D}}eep \underline{\textbf{L}}earning \underline{\textbf{A}}ugmented LLMs \underline{\textbf{P}}rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that \DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in software vulnerability detection: 1. **Insufficient generalization ability of existing deep learning (DL) models in practical projects**: Although deep - learning - based models perform well in research, in practical applications, due to the complex source code structure, black - box nature, and high requirements for domain knowledge, these models are difficult to maintain consistent high - performance across different projects. In addition, these models cannot provide interpretable results for developers, affecting the effectiveness of downstream tasks such as locating and fixing vulnerabilities. 2. **Poor performance of large - language models (LLM) on vulnerability detection tasks**: Although large - language models have made significant progress in multiple tasks such as dialogue generation and code generation, their performance in vulnerability detection is not satisfactory. The main reason is the improper use of prompt engineering in LLM, resulting in its performance on specific tasks being worse than expected. 3. **Combining the advantages of DL and LLM to improve vulnerability detection effectiveness**: Existing work attempts to use LLM for vulnerability detection, but usually only provides limited information input and fails to fully utilize the capabilities of LLM. Therefore, a new framework is needed to integrate the advantages of DL and LLM, overcome their respective limitations, and thus improve the overall performance of vulnerability detection. To this end, the paper proposes **DLAP** (Deep Learning Augmented Prompting Framework), an enhanced prompting framework aimed at achieving excellent vulnerability - detection performance by combining the best features of deep - learning models and large - language models. Specifically, DLAP solves the problem in the following ways: - Use pre - trained deep - learning models to customize implicitly fine - tuned prompts for the target project to adapt to the characteristics of specific projects. - Adopt two state - of - the - art prompting techniques - in - context learning (ICL) prompts and chain - of - thought (COT) prompts, which are respectively used to generate candidate code segments and their prediction probabilities, and synthesize the results of static scanning tools and pre - trained DL models as queries. - Experimental verification shows that DLAP outperforms existing prompting frameworks on multiple evaluation metrics, and can achieve an effect close to extensive fine - tuning at a lower cost, while generating more explanatory text to help developers better understand and use ASAT for vulnerability detection. In summary, the core problem of this paper is to combine deep - learning models with large - language models through innovative prompt - engineering techniques to solve the limitations of existing methods in actual software vulnerability detection and significantly improve detection performance.

DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection

Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Large Language Model for Vulnerability Detection: Emerging Results and Future Directions

How Far Have We Gone in Vulnerability Detection Using Large Language Models

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

LLbezpeky: Leveraging Large Language Models for Vulnerability Detection

Automated Software Vulnerability Patching using Large Language Models

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

A Preliminary Study on Using Large Language Models in Software Pentesting

ProRLearn: boosting prompt tuning-based vulnerability detection by reinforcement learning

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT

VDDL: A Deep Learning-Based Vulnerability Detection Model for Smart Contracts.

Automatic and Universal Prompt Injection Attacks against Large Language Models

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

VDDA: An Effective Software Vulnerability Detection Model Based on Deep Learning and Attention Mechanism

GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning

VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching

Deep Learning based Vulnerability Detection: Are We There Yet?

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models