Abstract:Open-source software (OSS) has profoundly transformed the software development paradigm by facilitating effortless code reuse. However, in recent years, there has been an alarming increase in disclosed vulnerabilities within OSS, posing significant security risks to downstream users. Therefore, analyzing existing vulnerabilities and precisely assessing their threats to downstream applications become pivotal. Plenty of efforts have been made recently towards this problem, such as vulnerability reachability analysis and vulnerability reproduction. The key to these tasks is identifying the vulnerable function (i.e., the function where the root cause of a vulnerability resides). However, public vulnerability datasets (e.g., NVD) rarely include this information as pinpointing the exact vulnerable functions remains to be a longstanding challenge. Existing methods mainly detect vulnerable functions based on vulnerability patches or Proof-of-Concept (PoC). However, such methods face significant limitations due to data availability and the requirement for extensive manual efforts, thus hindering scalability. To address this issue, we propose a novel approach VFFinder that localizes vulnerable functions based on Common Vulnerabilities and Exposures (CVE) descriptions and the corresponding source code utilizing Large Language Models (LLMs). Specifically, VFFinder adopts a customized in-context learning (ICL) approach based on CVE description patterns to enable LLM to extract key entities. It then performs priority matching with the source code to localize vulnerable functions. We assess the performance of VFFinder on 75 large open-source projects. The results demonstrate that VFFinder surpasses existing baselines significantly. Notably, the Top-1 and MRR metrics have been improved substantially, averaging 4.25X and 2.37X respectively. We also integrate VFFinder with Software Composition Analysis (SCA) tools, and the results show that our tool can reduce the false positive rates of existing SCA tools significantly.

Effective Vulnerable Function Identification Based on CVE Description Empowered by Large Language Models

Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge

Fine-grained Commit-level Vulnerability Type Prediction by CWE Tree Structure.

Patchmatch: A Tool for Locating Patches of Open Source Project Vulnerabilities

How Far Have We Gone in Vulnerability Detection Using Large Language Models

LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

An Empirical Study of Automated Vulnerability Localization with Large Language Models

VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Combining Software Metrics and Text Features for Vulnerable File Prediction

Vulnerability Detection for Source Code Using Contextual LSTM

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Attention Is All You Need for LLM-based Code Vulnerability Localization

VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

Software vulnerable functions discovery based on code composite feature

SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection

VTT-LLM: Advancing Vulnerability-to-Tactic-and-Technique Mapping through Fine-Tuning of Large Language Model

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

DeepVulSeeker: A novel vulnerability identification framework via code graph structure and pre-training mechanism

On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++