PACKHUNTER: Recovering Missing Packages for C/C++ Projects

Rongxin Wu,Zhiling Huang,Zige Tian,Chengpeng Wang,Xiangyu Zhang
DOI: https://doi.org/10.1109/tse.2024.3506629
IF: 7.4
2024-01-01
IEEE Transactions on Software Engineering
Abstract:The reproducibility of software artifacts is a critical aspect of software development and application. However, current research indicates that a notable proportion of C/C++ projects encounter non-reproducibility issues stemming from build failures, primarily attributed to the absence of necessary packages. This paper introduces PACKHUNTER, a novel technique that automates the recovery of missing packages in C/C++ projects. By identifying missing files during the project's build process, PACKHUNTER can determine potentially missing packages and synthesize an installation script. Specifically, it simplifies C/C++ projects through program reduction to reduce build overhead and simulates the presence of missing files via mock build to ensure a successful build for probing missing files. Besides, PACKHUNTER leverages a sophisticated design to eliminate packages that do not contain the required missing files, effectively reducing the search space. Furthermore, PACKHUNTER introduces a greedy strategy to prioritize the packages, eventually recovering missing packages with few times of package enumeration. We have implemented PACKHUNTER as a tool and evaluated it on 30 real-world projects. The results demonstrate that PACKHUNTER can recover missing packages efficiently, achieving 26.59× speed up over the state-of-the-art approach. The effectiveness of PACKHUNTER highlights its potential to assist developers in building C/C++ artifacts and promote software reproducibility.
What problem does this paper attempt to address?