An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues

Han Cui,Menglei Xie,Ting Su,Chengyu Zhang,Shin Hwei Tan
2024-08-25
Abstract:Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the maintainers, users and researchers in their issue repositories -- each of these issues manifested as a FN or FP of these analyzers in the history and has already been confirmed and fixed by the analyzers' developers. To this end, we conduct the first systematic study on a broad range of 350 historical issues of FNs/FPs from three popular static code analyzers (i.e., PMD, SpotBugs, and SonarQube). All these issues have been confirmed and fixed by the developers. We investigated these issues' root causes and the characteristics of the corresponding issue-triggering programs. It reveals several new interesting findings and implications on mitigating FNs and FPs. Furthermore, guided by some findings of our study, we designed a metamorphic testing strategy to find FNs and FPs. This strategy successfully found 14 new issues of FNs/FPs, 11 of which have been confirmed and 9 have already been fixed by the developers. Our further manual investigation of the studied analyzers revealed one rule specification issue and additional four FNs/FPs due to the weaknesses of the implemented static analysis. We have made all the artifacts (datasets and tools) publicly available at <a class="link-external link-https" href="https://zenodo.org/doi/10.5281/zenodo.11525129" rel="external noopener nofollow">this https URL</a>.
Software Engineering
What problem does this paper attempt to address?
This paper attempts to address the issues of false negatives (FNs) and false positives (FPs) in static code analyzers. Specifically, the authors approach the problem from a new perspective by examining the historical issue records of these analyzers to explore the root causes of FNs and FPs and the characteristics of the triggering programs. The main objectives of the study include: 1. **Investigate the root causes of FNs and FPs**: By analyzing 350 historical issues (from three popular static code analyzers: PMD, SpotBugs, and SonarQube), identify the specific reasons leading to FNs and FPs. 2. **Identify the characteristics of input programs**: Determine which characteristics of input programs lead to FNs and FPs, thereby helping developers design better testing strategies. 3. **Validate the practical application value of the research findings**: Based on the research findings, design a mutation testing strategy that successfully discovers new FNs and FPs and validates its effectiveness. The research mainly focuses on the following three issues: - **Incorrect rule specifications**: Defects in the rule descriptions themselves lead to false positives or false negatives. - **Inconsistent rule implementation**: The implementation of the rules does not match the specifications, resulting in incorrect detection results. - **Unaddressed language features and libraries**: Certain language features or libraries are not correctly handled, leading to false positives or false negatives. Through this systematic study, the authors hope to provide valuable insights for the developers of static code analyzers to improve the effectiveness and usability of the tools.