Abstract:Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation ( OP ) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation ( EROP ) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.

Is Operator-Based Mutant Selection Superior to Random Mutant Selection?

An Empirical Comparison of Mutant Selection Assessment Metrics

Mutation Selection: Some Could Be Better Than All

Selecting Fault Revealing Mutants

An Empirical Study on the Scalability of Selective Mutation Testing

MQP: Mutants Quality Prediction for Cost-Effective Mutation Testing

Mutant reduction evaluation: what is there and what is missing?

On the performance of different mutation operators of a subpopulation-based genetic algorithm for multi-robot task allocation problems

Analysis and comparison of a proposed mutation operator and its effects on the performance of genetic algorithm

An Empirical Evaluation of Mutation and Crossover Operators for Multi-Objective Uncertainty-Wise Test Minimization.

How Higher Order Mutant Testing Performs for Deep Learning Models: A Fine-Grained Evaluation of Test Effectiveness and Efficiency Improved from Second-Order Mutant-Classification Tuples

Mutation Operator Reduction for Cost-effective Deep Learning Software Testing Via Decision Boundary Change Measurement

An Approach to Test Data Generation for Killing Multiple Mutants

Targeted Mutation: A Novel Mutation Strategy for Differential Evolution

Source Code Optimization using Equivalent Mutants

Mutation Operator Reduction for Deep Learning System

Operator model for evolutionary dynamics

Performance impact of mutation operators of a subpopulation-based genetic algorithm for multi-robot task allocation problems

Test Data Generation for Mutation Testing Based on Markov Chain Usage Model and Estimation of Distribution Algorithm

An Empirical Evaluation of Manually Created Equivalent Mutants

Mutation-based Test Generation for Quantum Programs with Multi-Objective Search