On Rank Aggregating Test Prioritizations

Shouvick Mondal,Tse-Hsun Chen
2024-11-15
Abstract:Test case prioritization (TCP) has been an effective strategy to optimize regression testing. Traditionally, test cases are ordered based on some heuristic and rerun against the version under test with the goal of yielding a high failure throughput. Almost four decades of TCP research has seen extensive contributions in the light of individual prioritization strategies. However, test case prioritization via preference aggregation has largely been unexplored. We envision this methodology as an opportunity to obtain robust prioritizations by consolidating multiple standalone ranked lists, i.e., performing a consensus. In this work, we propose Ensemble Test Prioritization (EnTP) as a three stage pipeline involving: (i) ensemble selection, (ii) rank aggregation, and (iii) test case execution. We evaluate EnTP on 20 open-source C projects from the Software-artifact Infrastructure Repository and GitHub (totaling: 694,512 SLOC, 280 versions, and 69,305 system level test-cases). We employ an ensemble of 16 standalone prioritization plans, four of which are imposed due to respective state-of-the-art approaches. We build EnTP on the foundations of Hansie, an existing framework on consensus prioritization and show that EnTP's diversity based ensemble selection budget of top-75% followed by rank aggregation can outperform Hansie, and the employed standalone prioritization approaches.
Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of Test Case Prioritization (TCP) in software regression testing. Specifically, the author proposes a new method - Ensemble Test Prioritization (EnTP) to optimize regression testing by integrating multiple independent prioritization strategies. #### Main problem background: 1. **Importance of regression testing**: - Regression testing is an important part of software quality assurance. According to research, it may consume 80% of the overall test budget and account for 50% of software maintenance costs. - The execution order of test cases is crucial for the speed of detecting regression errors, thus shortening the feedback time. 2. **Limitations of traditional methods**: - Traditional TCP methods rely on a single heuristic algorithm, and these algorithms perform differently in different scenarios, making it difficult to find an optimal solution applicable to all situations. - A single prioritization strategy may lead to sub - optimal results because no heuristic method can consistently produce the best failure detection rate. 3. **Deficiencies in existing research**: - Although there has been a large amount of research on TCP, the method of test case prioritization through preference aggregation has not been fully explored. - Although existing consensus prioritization frameworks (such as Hansie) have improvements, there is still room for improvement in diversity selection and aggregation. #### Proposed solution: The author proposes EnTP, which is a pipeline system based on three stages, including: - **Ensemble selection**: Select a diverse subset from multiple independent prioritization strategies. - **Rank aggregation**: Generate a consensus prioritization by performing rank aggregation on the selected diverse subset. - **Test case execution**: Execute test cases according to the consensus prioritization. #### Key innovation points: - **Diversity selection**: EnTP measures the differences between different ranking strategies by calculating the Kendall - tau distance and selects the top k% of the most diverse ranking strategies. - **Application of social choice theory**: Use the principles of social choice theory, such as Kemeny - Young consensus and Borda count method, for rank aggregation. - **Empirical evaluation**: Extensive experiments were carried out on 20 open - source C projects, demonstrating the superiority of EnTP in indicators such as APFD. Through this method, EnTP can better adapt to the needs in different scenarios and provide a more robust test case prioritization scheme.