Optimizing Search-Based Unit Test Generation with Large Language Models: an Empirical Study

Danni Xiao,Yimeng Guo,Yanhui Li,Lin Chen
DOI: https://doi.org/10.1145/3671016.3674813
2024-01-01
Abstract:Search-based unit test generation methods have been considered effective and widely applied, and Large Language Models (LLMs) have also demonstrated their powerful generation ability. Therefore, some scholars have proposed using LLMs to enhance search-based unit test generation methods and have preliminarily confirmed that LLMs can help alleviate the problem of test coverage plateaus. However, it is still unclear when and how LLMs should intervene in the time-consuming test generation process. This paper explores the application of LLMs at various stages of search-based test generation (SBTG) (including the initial stage, the test generation period, and the test coverage plateaus), as well as strategies for controlling the frequency of LLM intervention. A comprehensive empirical study was conducted on 486 Python benchmark modules from 27 projects. The experimental results show that 1) LLM intervention has a positive effect at any stage, whether to improve coverage over a fixed period or to reduce the time to reach a specific coverage; 2) a reasonable intervention frequency is crucial for LLMs to have a positive effect on SBTG. This work can better help understand when and how LLMs should be applied in SBTG and provide valuable suggestions for developers in practice.
What problem does this paper attempt to address?