Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Cheng Wen,Yuandao Cai,Bin Zhang,Jie Su,Zhiwu Xu,Dugang Liu,Shengchao Qin,Zhong Ming,Cong Tian
DOI: https://doi.org/10.1145/3653718
IF: 4.157
2024-03-26
ACM Transactions on Knowledge Discovery from Data
Abstract:Static analysis tools for capturing bugs and vulnerabilities in software programs are widely employed in practice, as they have the unique advantages of high coverage and independence from the execution environment. However, existing tools for analyzing large codebases often produce a great deal of false warnings over genuine bug reports. As a result, developers are required to manually inspect and confirm each warning, a challenging, time-consuming, and automation-essential task. This paper advocates a fast, general, and easily extensible approach called Llm4sa that automatically inspects a sheer volume of static warnings by harnessing (some of) the powers of Large Language Models (LLMs). Our key insight is that LLMs have advanced program understanding capabilities, enabling them to effectively act as human experts in conducting manual inspections on bug warnings with their relevant code snippets. In this spirit, we propose a static analysis to effectively extract the relevant code snippets via program dependence traversal guided by the bug warnings reports themselves. Then, by formulating customized questions that are enriched with domain knowledge and representative cases to query LLMs, Llm4sa can remove a great deal of false warnings and facilitate bug discovery significantly. Our experiments demonstrate that Llm4sa is practical in automatically inspecting thousands of static warnings from Juliet benchmark programs and 11 real-world C/C++ projects, showcasing a high precision (81.13%) and a recall rate (94.64%) for a total of 9,547 bug warnings. Our research introduces new opportunities and methodologies for using the LLMs to reduce human labor costs, improve the precision of static analyzers, and ensure software trustworthiness.
computer science, information systems, software engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of static analysis tools generating a large number of false positives when dealing with large - scale codebases. Specifically: 1. **Limitations of static analysis tools**: - Although static analysis tools have the advantages of high coverage and independence from the execution environment, they often generate a large number of false positives when analyzing large - scale codebases. - These false positives require developers to manually check and confirm each warning, which is a time - consuming task that requires professional knowledge. 2. **Deficiencies of existing solutions**: - Dynamic analysis methods (such as directed grey - box fuzzing and dynamic symbolic execution) can screen out real vulnerabilities, but these methods are very time - consuming and difficult to effectively apply when faced with thousands of error reports. - Pattern - recognition - based methods (such as machine - learning techniques) can predict false positives, but they have a low recall rate in practical applications and are difficult to reflect real - world situations. 3. **The proposed new method**: - The paper proposes a new method named Llm4sa, which uses large - language models (LLMs) to automatically check a large number of static analysis warnings. - Llm4sa solves the above problems in the following ways: - **Code snippet extraction**: Extract code snippets related to error reports through program - dependence traversal to reduce the input length limit of LLMs. - **Prompt engineering**: Design effective prompts to help LLMs better understand static warnings and perform reasoning. - **Pre - processing and post - processing**: Convert error reports generated by different static analysis tools into a unified format and determine the confidence level of LLMs' answers. ### Summary The core problem of this paper is to improve the precision of static analysis tools and reduce the number of false positives, thereby reducing the burden on developers. By introducing LLMs, Llm4sa can automatically check thousands of static warnings in large - scale codebases, significantly improving the efficiency and accuracy of static analysis.