Towards Explainable Test Case Prioritisation with Learning-to-Rank Models

Aurora Ramírez,Mario Berrios,José Raúl Romero,Robert Feldt
DOI: https://doi.org/10.1109/ICSTW58534.2023.00023
2024-05-23
Abstract:Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in regression testing, how to improve the transparency and credibility of test case prioritisation (TCP) through explainable artificial intelligence (XAI) techniques. Specifically: 1. **Existing challenges**: - **Black - box nature of machine - learning models**: Traditional machine - learning models (such as random forests, gradient - boosting machines and neural networks) perform well in prediction performance, but their internal operating mechanisms are difficult to understand, which brings challenges in practical applications (such as in critical areas like medical and financial). - **Explanation requirements in TCP**: In software testing, especially in regression testing, testers need to understand why a certain test case is ranked in a specific position and why some test cases are executed earlier than others. 2. **Research objectives**: - **Global explanation**: Understand which features have the greatest impact on the prediction results of the entire model. For example, in TCP, which factors (such as the age of the test case, execution time, developer's experience, etc.) have the greatest impact on the priority of the test case. - **Local explanation**: Explain the ranking reasons for a single test case. For example, why a certain test case is ranked in a specific position, why it is ranked higher than another test case, or why it may fail while another test case may pass. - **Cross - build explanation**: Analyze the changes in feature contributions between different build versions and how these changes affect the ranking of test cases. 3. **Specific problems**: - **Changes in feature contributions in different build versions**: As the system develops, the importance of features may change, so it is necessary to analyze these changes and their impact on TCP. - **Ranking differences of the same test case in different build versions**: Understand why the same test case is ranked differently in different build versions. - **Comparative explanation**: Explain why some test cases are predicted to fail while others are predicted to pass, and how to improve the ranking of test cases by changing certain properties. 4. **Experimental verification**: - The paper experimentally verified the above problems, using a system named "angel" and applying the LambdaMART algorithm for prediction. The experimental results show that similar prediction results usually have similar explanations, which helps developers understand the decision - making process of the model more quickly. In summary, this paper aims to make the TCP process more transparent and explainable by introducing XAI techniques, thereby improving the efficiency and credibility of software testing.