Abstract:Knowing the effectiveness of a test suite is essential for many activities such as assessing the test adequacy of code and guiding the generation of new test cases. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive, as a large number of mutants (buggy versions) are needed to be generated from a production code under test and executed against the test suite. In order to reduce the expensive testing cost, recent studies proposed to use supervised models to predict the effectiveness of a test suite without executing the test suite against the mutants. Nonetheless, the training of such a supervised model requires labeled data, which still depends on the costly mutant execution. Furthermore, existing models are based on traditional supervised learning techniques, which assume that the training and testing data come from the same distribution. But, in practice, software systems are subject to considerable concept drifts, i.e., the same distribution assumption usually does not hold. This can lead to inaccurate predictions of a learned supervised model on the target code as time progresses. To tackle these problems, in this paper, we propose a Coverage-Based Unsupervised Approach (CBUA) for evaluating the effectiveness of a test suite. Given a production code under test, the corresponding mutants, and a test suite, CBUA first collects the coverage information of the mutated statements in the target production code under the execution of the test suite. Then, CBUA employs coverage to estimate the probability of each mutant being alive. As such, a mutation score is computed to evaluate the test suite effectiveness and the predicted labels (i.e., killed or alive) are obtained. The whole process only requires a one-time execution of the test suite against the target production code, without involving any mutant execution and any training data. CBUA can ensure the score monotonicity property (i.e., adding test cases to a test suite does not decrease its mutation score), which may be violated by a supervised approach. The experimental results show that CBUA is very competitive with the state-of-the-art supervised approaches in prediction accuracy. In particular, CBUA is shown to be more effective in finding mutants that are covered but not killed by a test suite, which is helpful in identifying the weaknesses in the current test suite and generating new test cases accordingly. Since CBUA is an easy-to-implement approach with a low cost, we suggest that it should be used as a baseline approach for comparison when any novel prediction approach is proposed in future studies.

Do Pseudo Test Suites Lead to Inflated Correlation in Measuring Test Effectiveness?

Will My Tests Tell Me If I Break This Code?

Assessing Effectiveness of Test Suites: What Do We Know and What Should We Do?

Test suite effectiveness metric evaluation: what do we know and what should we do?

How Do Assertions Impact Coverage-Based Test-Suite Reduction?

Early Estimate the Size of Test Suites from Use Cases

An Empirical Comparison of Mutant Selection Assessment Metrics

Mind the Gap: The Difference Between Coverage and Mutation Score Can Guide Testing Efforts

How is Testing Related to Single Statement Bugs?

Predictive Mutation Testing

Comparing Mutation Coverage Against Branch Coverage in an Industrial Setting

Test Adequacy Criterion Based on Coincidental Correctness Probability

CBUA: A Probabilistic, Predictive, and Practical Approach for Evaluating Test Suite Effectiveness

Empirical Evaluation of Test Coverage for Functional Programs.

A test suite reduction approach based on pairwise interaction of requirements.

An Empirical Study on the Scalability of Selective Mutation Testing

Does mutation testing improve testing practices?

Comparing Logic Coverage Criteria on Test Case Prioritization

An Experimental Study of Four Typical Test Suite Reduction Techniques

The Inversive Relationship Between Bugs and Patches: An Empirical Study

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures