Abstract:Static Application Security Testing(SAST) tools are crucial for early bug detection and code quality but often generate false positives that slow development. Automating false positive mitigation is thus essential for advancing SAST tools. Past efforts use static/dynamic analysis or machine learning. The advent of Large Language Models, adept at understanding natural language and code, offers promising ways to improve the accuracy and usability of SAST tools. However, existing LLM-based methods need improvement in two key areas: first, extracted code snippets related to warnings are often cluttered with irrelevant control and data flows, reducing precision; second, critical code contexts are often missing, leading to incomplete representations that can mislead LLMs and cause inaccurate assessments. To ensure the use of precise and complete code context, thereby avoiding misguidance and enabling LLMs to reach accurate conclusions, we propose LLM4FPM. One of its core components is eCPG-Slicer, which builds an extended code property graph and extracts line-level, precise code context. Moreover, LLM4FPM incorporates FARF algorithm, which builds a file reference graph and then efficiently detects all files related to a warning in linear time, enabling eCPG-Slicer to gather complete code context across these files. We evaluate LLM4FPM on Juliet dataset, where it comprehensively outperforms the baseline, achieving an F1 score above 99% across various CWEs. LLM4FPM leverages a free, open-source model, avoiding costly alternatives and reducing inspection costs by up to $2758 per run on Juliet, with an average inspection time of 4.7 seconds per warning. Our work emphasizes the critical impact of precise and complete code context and highlights the potential of combining program analysis with LLMs, improving the quality and efficiency of software development.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the large number of false positives generated by Static Application Security Testing (SAST) tools when detecting code vulnerabilities. Specifically, although SAST tools are crucial for early error detection, improving code quality and efficiency in modern software development, they often produce many false positives, which require manual review and thus slow down the development progress. Therefore, automated False Positive Mitigation (FPM) is very critical for improving the performance of SAST tools. The paper points out that the existing methods based on large - language models (LLMs) have two main limitations: 1. **The extracted code fragments are too broad and messy**: The relevant warning code fragments usually contain too much irrelevant control - flow and data - flow information, reducing the precision. 2. **The lack of key code context**: Important code context information (such as global variables or external function definitions) is often ignored, resulting in an incomplete representation, which in turn misleads LLMs and causes inaccurate evaluations. To solve these problems, the authors propose a framework named LLM4FPM, which aims to use accurate and complete code context to guide LLMs for automatic false - positive mitigation. The core components of this framework include: - **eCPG - Slicer**: Build an Extended Code Property Graph (eCPG) and then extract the line - level accurate code context related to the warning from it. - **FARF algorithm**: Build a file reference graph and identify strongly connected components to efficiently detect all files related to a given warning in linear time, ensuring a complete code - context representation. Through these improvements, LLM4FPM can more accurately determine whether the warnings generated by SAST tools are false positives, thereby significantly reducing the number of false positives and improving development efficiency and code quality. ### Specific contributions 1. **Proposed a line - level accurate code slicer eCPG - Slicer**: It builds an Extended Code Property Graph (eCPG) and extracts the line - level code context related to the warning from a given file. 2. **Designed a linear - complexity algorithm FARF**: It is used to identify the source files related to the warning, enabling the slicer to extract the complete code context. 3. **Integrated eCPG - Slicer and FARF algorithm into the LLM4FPM framework**: This framework can efficiently drive LLMs to judge the warnings generated by SAST tools. ### Evaluation results The paper evaluated LLM4FPM on the Juliet data set. The results show that it achieved an F1 score of more than 99% on various Common Weakness Enumerations (CWEs), and the false - positive rate in actual projects was reduced by more than 85%. In addition, the operation cost of LLM4FPM is low, with an average inspection time of 4.7 seconds per warning, saving about $2,758 per run. In conclusion, this paper emphasizes the importance of accurate and complete code context and shows the potential of combining program analysis with LLMs, promoting the development of automatic vulnerability analysis and improving the quality and efficiency of modern software development.

Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors

Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs

Mitigating False Positive Static Analysis Warnings: Progress, Challenges, and Opportunities

LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Frustrated with Code Quality Issues? LLMs can Help!

Software Vulnerability and Functionality Assessment using LLMs

A Preliminary Study on Using Large Language Models in Software Pentesting

LMs: Understanding Code Syntax and Semantics for Code Analysis

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

Security Attacks on LLM-based Code Completion Tools

Beyond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Exploring Automated Assertion Generation Via Large Language Models

Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing

PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation

Beyond ChatGPT: Enhancing Software Quality Assurance Tasks with Diverse LLMs and Validation Techniques

Impact of Large Language Models of Code on Fault Localization