Contrasting test selection, prioritization, and batch testing at scale

Emad Fallahzadeh,Peter C. Rigby,Bram Adams
DOI: https://doi.org/10.1007/s10664-024-10589-8
IF: 3.762
2024-11-30
Empirical Software Engineering
Abstract:The effectiveness of software testing is crucial for successful software releases, and various test optimization techniques aim to enhance this process by reducing the number of test executions or prioritizing potential test failures. Although different families of techniques exist, each with its own evaluation criteria, few studies have compared these different lines of research. This study addresses this gap by empirically comparing Yaraghi et al.'s test prioritization approach, Zhu et al.'s cross-build test prioritization and its equivalent test selection technique, and our BatchAll test batching algorithm. To evaluate these test optimization approaches, we empirically analyze millions of test results from Google Chrome, along with pre- and post-commit test outcomes for a Google project, as well as the JMRI Travis CI dataset. Findings reveal that test selection can reduce actual median feedback time by up to 96% with the same number of machines but may miss up to 55% of failures. In contrast, batching achieves up to a 99% reduction in feedback time without missing any failures. Test selection cuts machine usage by up to 66%, while batching achieves up to an 88% reduction. For failure detection, the test selection is up to 62 minutes faster than the baseline, and the batching algorithm achieves up to a 63-minute median improvement without missing failures. Regarding test execution time, test selection saves up to 66%, whereas batching's saving can reach up to 98%, although its performance varies based on the machines used. The studied test prioritization algorithms significantly underperform compared to the test selection and batching algorithms. In conclusion, this study provides practical recommendations for selecting appropriate test optimization algorithms based on the testing environment and failure loss tolerance.
computer science, software engineering
What problem does this paper attempt to address?