Generation of Regression Tests From Logs With Clustering Guided by Usage Patterns

Frédéric Tamagnan,Alexandre Vernotte,Fabrice Bouquet,Bruno Legeard
DOI: https://doi.org/10.1002/stvr.1900
2024-09-21
Software Testing Verification and Reliability
Abstract:A novel metric, Usage Patter Coverage, is introduced to measure how effectively a test suite captures the principle behaviours observed in user traces. The metric is the used to fine‐tune and benchmark clustering pipelines, selecting user traces to be converted into test cases. Clustering is increasingly being used to select the appropriate test suites. In this paper, we apply this approach to regression testing. Regression testing is the practice of verifying the robustness and reliability of software by retesting after changes have been made. Creating and maintaining functional regression tests is a laborious and costly activity. To be effective, these tests must represent the actual user journeys of the application. In addition, an optimal number of test cases is critical for the rapid execution of the regression test suite to stay within the time and computational resource budget as it is re‐run at each major iteration of the software development. Therefore, the selection and maintenance of functional regression tests based on the analysis of application logs has gained popularity in recent years. This paper presents a novel approach to improve regression testing by automating the creation of test suites using user traces fed into clustering pipelines. Our methodology introduces a new metric based on pattern mining to quantify the statistical coverage of prevalent user paths. This metric helps to determine the optimal number of clusters within a clustering pipeline, thus addressing the challenge of suboptimal test suite sizes. Additionally, we introduce two criteria, to systematically evaluate and rank clustering pipelines. Experimentation involving 33 variations of clustering pipelines across four datasets demonstrates the potential effectiveness of our automated approach compared with manually crafted test suites. (All the experiments and data on Scanner, Spree and Booked Scheduler are available at https://github.com/frederictamagnan/STVR2024.) Then, we analyse the semantics of the clusters based on their principal composing patterns.
computer science, software engineering
What problem does this paper attempt to address?