Abstract:Bug-related research, e.g., fault localization, program repair, and software testing, relies heavily on high-quality and large-scale software bug repositories. The importance of such repositories is twofold. On one side, real-world bugs and their associated patches may inspire novel approaches for finding, locating, and repairing software bugs. On the other side, the real-world bugs and their patches are indispensable for rigorous and meaningful evaluation of approaches to software testing, fault localization, and program repair. To this end, a number of software bug repositories, e.g., iBUGS and Defects4J, have been constructed recently by mining version control systems and bug tracking systems. However, fully automated construction of bug repositories by simply taking bug-fixing commits from version control systems often results in inaccurate patches that contain many bug-irrelevant changes. Although we may request experts or developers to manually exclude the bug-irrelevant changes (as the authors of Defects4J did), such extensive human intervention makes it difficult to build large-scale bug repositories. To this end, in this paper, we propose an automatic approach, called BugBuilder, to construct bug repositories from version control systems. Different from existing approaches, it automatically extracts complete and concise bug-fixing patches and excludes bug-irrelevant changes. It first detects and excludes software refactorings involved in bug-fixing commits. BugBuilder then enumerates all subsets of the remaining part, and discards invalid subsets by compilation and software testing. If exactly a single subset survives the validation, this subset is taken as the complete and concise bug-fixing patch for the associated bug. In case multiple subsets survive, BugBuilder employs a sequence of heuristics to select the most likely one. Evaluation results on 809 real-world bug-fixing commits in Defects4J suggest that BugBuilder successfully extracted complete and concise bug-fixing patches from forty-three percent of the bug-fixing commits, and its precision (99%) was even higher than human experts. We also built a bug repository, called GrowingBugs, with the proposed approach. The resulting repository serves as evidence of the usefulness of the proposed approach, as well as a publicly available benchmark for bug-related research.

How Well Industry-Level Cause Bisection Works in Real-World: A Study on Linux Kernel

"Automated Debugging Considered Harmful" Considered Harmful A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems

An Empirical Study of Bugs in Industrial Financial Systems.

An Empirical Study of Bugs in Build Process

The Future Can’t Help Fix the Past: Assessing Program Repair in the Wild

Beyond fixing bugs: case studies of creative collaboration in open source software bug fixing processes.

Why and How Bug Blocking Relations Are Breakable: an Empirical Study on Breakable Blocking Bugs

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects

An Empirical Study of Bug Fixing Rate.

Empirical Evaluation of Bug Linking

An Empirical Study on Critical Blocking Bugs.

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction.

How Do Developers Fix Cross-Project Correlated Bugs? A Case Study on the GitHub Scientific Python Ecosystem

Impact Analysis of Cross-Project Bugs on Software Ecosystems

Examining the Effects of Developer Familiarity on Bug Fixing

BugBuilder: an Automated Approach to Building Bug Repository.

Industry Practice of Directed Kernel Fuzzing for Open-source Linux Distribution

Bug Inducing Analysis to Prevent Fault Prone Bug Fixes.

An Empirical Study of Refactoring Engine Bugs

Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel

An Empirical Study on Downstream Workarounds for Cross-Project Bugs