DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection

Yanjing Yang,Xin Zhou,Runfeng Mao,Jinwei Xu,Lanxin Yang,Yu Zhangm,Haifeng Shen,He Zhang
2024-05-02
Abstract:Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge due to the complex structure of source code, the black-box nature of DL, and the domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) have made lots of progress in addressing these issues by leveraging prompting techniques. Unfortunately, their performance in identifying vulnerabilities is unsatisfactory. This paper contributes \textbf{\DLAP}, a \underline{\textbf{D}}eep \underline{\textbf{L}}earning \underline{\textbf{A}}ugmented LLMs \underline{\textbf{P}}rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that \DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.
Software Engineering,Cryptography and Security
What problem does this paper attempt to address?
This paper attempts to solve several key problems in software vulnerability detection: 1. **Insufficient generalization ability of existing deep learning (DL) models in practical projects**: Although deep - learning - based models perform well in research, in practical applications, due to the complex source code structure, black - box nature, and high requirements for domain knowledge, these models are difficult to maintain consistent high - performance across different projects. In addition, these models cannot provide interpretable results for developers, affecting the effectiveness of downstream tasks such as locating and fixing vulnerabilities. 2. **Poor performance of large - language models (LLM) on vulnerability detection tasks**: Although large - language models have made significant progress in multiple tasks such as dialogue generation and code generation, their performance in vulnerability detection is not satisfactory. The main reason is the improper use of prompt engineering in LLM, resulting in its performance on specific tasks being worse than expected. 3. **Combining the advantages of DL and LLM to improve vulnerability detection effectiveness**: Existing work attempts to use LLM for vulnerability detection, but usually only provides limited information input and fails to fully utilize the capabilities of LLM. Therefore, a new framework is needed to integrate the advantages of DL and LLM, overcome their respective limitations, and thus improve the overall performance of vulnerability detection. To this end, the paper proposes **DLAP** (Deep Learning Augmented Prompting Framework), an enhanced prompting framework aimed at achieving excellent vulnerability - detection performance by combining the best features of deep - learning models and large - language models. Specifically, DLAP solves the problem in the following ways: - Use pre - trained deep - learning models to customize implicitly fine - tuned prompts for the target project to adapt to the characteristics of specific projects. - Adopt two state - of - the - art prompting techniques - in - context learning (ICL) prompts and chain - of - thought (COT) prompts, which are respectively used to generate candidate code segments and their prediction probabilities, and synthesize the results of static scanning tools and pre - trained DL models as queries. - Experimental verification shows that DLAP outperforms existing prompting frameworks on multiple evaluation metrics, and can achieve an effect close to extensive fine - tuning at a lower cost, while generating more explanatory text to help developers better understand and use ASAT for vulnerability detection. In summary, the core problem of this paper is to combine deep - learning models with large - language models through innovative prompt - engineering techniques to solve the limitations of existing methods in actual software vulnerability detection and significantly improve detection performance.