Abstract:In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.

Reducing False Positives of Static Bug Detectors Through Code Representation Learning

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Towards Automated Reentrancy Detection for Smart Contracts Based on Sequential Models

Automated Static Warning Identification via Path-based Semantic Representation

Mitigating False Positive Static Analysis Warnings: Progress, Challenges, and Opportunities

Utilizing Precise and Complete Code Context to Guide LLM in Automatic False Positive Mitigation

How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs

FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools

Towards Understanding Fixes of SonarQube Static Analysis Violations: A Large-Scale Empirical Study

Learning to Recognize Actionable Static Code Warnings (is Intrinsically Easy)

Using Multiple Code Representations to Prioritize Static Analysis Warnings

Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis Agents

High-Impact Bug Report Identification with Imbalanced Learning Strategies

An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues

ACWRecommender: A Tool for Validating Actionable Warnings with Weak Supervision

Validating Static Warnings via Testing Code Fragments

Improving software security with static automated code analysis in an industry setting

Learning a Static Bug Finder from Data

Classifying False Positive Static Checker Alarms in Continuous Integration Using Convolutional Neural Networks

LLM-Assisted Static Analysis for Detecting Security Vulnerabilities

Automatic Construction of an Effective Training Set for Prioritizing Static Analysis Warnings.