Abstract:Bug-related research, e.g., fault localization, program repair, and software testing, relies heavily on high-quality and large-scale software bug repositories. The importance of such repositories is twofold. On one side, real-world bugs and their associated patches may inspire novel approaches for finding, locating, and repairing software bugs. On the other side, the real-world bugs and their patches are indispensable for rigorous and meaningful evaluation of approaches to software testing, fault localization, and program repair. To this end, a number of software bug repositories, e.g., iBUGS and Defects4J, have been constructed recently by mining version control systems and bug tracking systems. However, fully automated construction of bug repositories by simply taking bug-fixing commits from version control systems often results in inaccurate patches that contain many bug-irrelevant changes. Although we may request experts or developers to manually exclude the bug-irrelevant changes (as the authors of Defects4J did), such extensive human intervention makes it difficult to build large-scale bug repositories. To this end, in this paper, we propose an automatic approach, called BugBuilder, to construct bug repositories from version control systems. Different from existing approaches, it automatically extracts complete and concise bug-fixing patches and excludes bug-irrelevant changes. It first detects and excludes software refactorings involved in bug-fixing commits. BugBuilder then enumerates all subsets of the remaining part, and discards invalid subsets by compilation and software testing. If exactly a single subset survives the validation, this subset is taken as the complete and concise bug-fixing patch for the associated bug. In case multiple subsets survive, BugBuilder employs a sequence of heuristics to select the most likely one. Evaluation results on 809 real-world bug-fixing commits in Defects4J suggest that BugBuilder successfully extracted complete and concise bug-fixing patches from forty-three percent of the bug-fixing commits, and its precision (99%) was even higher than human experts. We also built a bug repository, called GrowingBugs, with the proposed approach. The resulting repository serves as evidence of the usefulness of the proposed approach, as well as a publicly available benchmark for bug-related research.

GitBug-Java: A Reproducible Benchmark of Recent Java Bugs

An Empirical Study of Bugs in Build Process

The Future Can’t Help Fix the Past: Assessing Program Repair in the Wild

Beyond fixing bugs: case studies of creative collaboration in open source software bug fixing processes.

Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset

JITO: a Tool for Just-in-time Defect Identification and Localization

BugBuilder: an Automated Approach to Building Bug Repository.

JaConTeBe: A Benchmark Suite of Real-World Java Concurrency Bugs (T)

BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies

BUMP: A Benchmark of Reproducible Breaking Dependency Updates

An Empirical Study on Real Bug Fixes

BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems

BUGSPHP: A dataset for Automated Program Repair in PHP

Compilation of Commit Changes within Java Source Code Repositories

A Study of Bug Resolution Characteristics in Popular Programming Languages

DroidBugs: An Android Benchmark for Automated Program Repair

Understanding and Finding Java Decompiler Bugs

Towards characterizing bug fixes through dependency-level changes in Apache Java open source projects

Detecting JVM JIT Compiler Bugs via Exploring Two-Dimensional Input Spaces

Identifying Bugs in Make and JVM-Oriented Builds