Abstract:Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in regression testing, how to improve the transparency and credibility of test case prioritisation (TCP) through explainable artificial intelligence (XAI) techniques. Specifically: 1. **Existing challenges**: - **Black - box nature of machine - learning models**: Traditional machine - learning models (such as random forests, gradient - boosting machines and neural networks) perform well in prediction performance, but their internal operating mechanisms are difficult to understand, which brings challenges in practical applications (such as in critical areas like medical and financial). - **Explanation requirements in TCP**: In software testing, especially in regression testing, testers need to understand why a certain test case is ranked in a specific position and why some test cases are executed earlier than others. 2. **Research objectives**: - **Global explanation**: Understand which features have the greatest impact on the prediction results of the entire model. For example, in TCP, which factors (such as the age of the test case, execution time, developer's experience, etc.) have the greatest impact on the priority of the test case. - **Local explanation**: Explain the ranking reasons for a single test case. For example, why a certain test case is ranked in a specific position, why it is ranked higher than another test case, or why it may fail while another test case may pass. - **Cross - build explanation**: Analyze the changes in feature contributions between different build versions and how these changes affect the ranking of test cases. 3. **Specific problems**: - **Changes in feature contributions in different build versions**: As the system develops, the importance of features may change, so it is necessary to analyze these changes and their impact on TCP. - **Ranking differences of the same test case in different build versions**: Understand why the same test case is ranked differently in different build versions. - **Comparative explanation**: Explain why some test cases are predicted to fail while others are predicted to pass, and how to improve the ranking of test cases by changing certain properties. 4. **Experimental verification**: - The paper experimentally verified the above problems, using a system named "angel" and applying the LambdaMART algorithm for prediction. The experimental results show that similar prediction results usually have similar explanations, which helps developers understand the decision - making process of the model more quickly. In summary, this paper aims to make the TCP process more transparent and explainable by introducing XAI techniques, thereby improving the efficiency and credibility of software testing.

Towards Explainable Test Case Prioritisation with Learning-to-Rank Models

Evaluating Local Model-Agnostic Explanations of Learning to Rank Models with Decision Paths

Test Case Prioritization Techniques for Model-Based Testing: A Replicated Study

On the Relationship between Explanation and Recommendation: Learning to Rank Explanations for Improved Performance

Ranking by Aggregating Referees: Evaluating the Informativeness of Explanation Methods for Time Series Classification

Learning to Scaffold: Optimizing Model Explanations for Teaching

Evaluating the Explainability of Neural Rankers

On Rank Aggregating Test Prioritizations

Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance

Explainable CTR Prediction via LLM Reasoning

Reinforcement Learning for Test Case Prioritization

Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from LLMs

Inference-time Stochastic Ranking with Risk Control

Scale-Invariant Learning-to-Rank

Traceability Link Recovery between Requirements and Models using an Evolutionary Algorithm Guided by a Learning to Rank Algorithm: Train control and management case

TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model

Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness

Robust Ranking Explanations

Test Case Generation Evaluator for the Implementation of Test Case Generation Algorithms Based on Learning to Rank.

Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales

Segment-Based Test Case Prioritization: A Multi-objective Approach