Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Cheng Wen,Yuandao Cai,Bin Zhang,Jie Su,Zhiwu Xu,Dugang Liu,Shengchao Qin,Zhong Ming,Cong Tian

DOI: https://doi.org/10.1145/3653718

IF: 4.157

2024-03-26

ACM Transactions on Knowledge Discovery from Data

Abstract:Static analysis tools for capturing bugs and vulnerabilities in software programs are widely employed in practice, as they have the unique advantages of high coverage and independence from the execution environment. However, existing tools for analyzing large codebases often produce a great deal of false warnings over genuine bug reports. As a result, developers are required to manually inspect and confirm each warning, a challenging, time-consuming, and automation-essential task. This paper advocates a fast, general, and easily extensible approach called Llm4sa that automatically inspects a sheer volume of static warnings by harnessing (some of) the powers of Large Language Models (LLMs). Our key insight is that LLMs have advanced program understanding capabilities, enabling them to effectively act as human experts in conducting manual inspections on bug warnings with their relevant code snippets. In this spirit, we propose a static analysis to effectively extract the relevant code snippets via program dependence traversal guided by the bug warnings reports themselves. Then, by formulating customized questions that are enriched with domain knowledge and representative cases to query LLMs, Llm4sa can remove a great deal of false warnings and facilitate bug discovery significantly. Our experiments demonstrate that Llm4sa is practical in automatically inspecting thousands of static warnings from Juliet benchmark programs and 11 real-world C/C++ projects, showcasing a high precision (81.13%) and a recall rate (94.64%) for a total of 9,547 bug warnings. Our research introduces new opportunities and methodologies for using the LLMs to reduce human labor costs, improve the precision of static analyzers, and ensure software trustworthiness.

computer science, information systems, software engineering

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of static analysis tools generating a large number of false positives when dealing with large - scale codebases. Specifically: 1. **Limitations of static analysis tools**: - Although static analysis tools have the advantages of high coverage and independence from the execution environment, they often generate a large number of false positives when analyzing large - scale codebases. - These false positives require developers to manually check and confirm each warning, which is a time - consuming task that requires professional knowledge. 2. **Deficiencies of existing solutions**: - Dynamic analysis methods (such as directed grey - box fuzzing and dynamic symbolic execution) can screen out real vulnerabilities, but these methods are very time - consuming and difficult to effectively apply when faced with thousands of error reports. - Pattern - recognition - based methods (such as machine - learning techniques) can predict false positives, but they have a low recall rate in practical applications and are difficult to reflect real - world situations. 3. **The proposed new method**: - The paper proposes a new method named Llm4sa, which uses large - language models (LLMs) to automatically check a large number of static analysis warnings. - Llm4sa solves the above problems in the following ways: - **Code snippet extraction**: Extract code snippets related to error reports through program - dependence traversal to reduce the input length limit of LLMs. - **Prompt engineering**: Design effective prompts to help LLMs better understand static warnings and perform reasoning. - **Pre - processing and post - processing**: Convert error reports generated by different static analysis tools into a unified format and determine the confidence level of LLMs' answers. ### Summary The core problem of this paper is to improve the precision of static analysis tools and reduce the number of false positives, thereby reducing the burden on developers. By introducing LLMs, Llm4sa can automatically check thousands of static warnings in large - scale codebases, significantly improving the efficiency and accuracy of static analysis.

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation

Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach

The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models

LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study

Exploring Automated Assertion Generation Via Large Language Models

Software Vulnerability and Functionality Assessment using LLMs

Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities

Validating Static Warnings via Testing Code Fragments

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Harnessing the Power of LLM to Support Binary Taint Analysis

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

SkipAnalyzer: A Tool for Static Code Analysis with Large Language Models

Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification