TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration

Siqi Gu,Quanjun Zhang,Chunrong Fang,Fangyuan Tian,Liuchuan Zhu,Jianyi Zhou,Zhenyu Chen
2024-12-21
Abstract:Unit testing is crucial for detecting bugs in individual program units but consumes time and effort. Recently, large language models (LLMs) have demonstrated remarkable capabilities in generating unit test cases. However, several problems limit their ability to generate high-quality unit test cases: (1) compilation and runtime errors caused by the hallucination of LLMs; (2) lack of testing and coverage feedback information restricting the increase of code coverage;(3) the repetitive suppression problem causing invalid LLM-based repair and generation attempts. To address these limitations, we propose TestART, a novel unit test generation method. TestART improves LLM-based unit testing via co-evolution of automated generation and repair iteration, representing a significant advancement in automated unit test generation. TestART leverages the template-based repair strategy to effectively fix bugs in LLM-generated test cases for the first time. Meanwhile, TestART extracts coverage information from successful test cases and uses it as coverage-guided testing feedback. It also incorporates positive prompt injection to prevent repetition suppression, thereby enhancing the sufficiency of the final test case. This synergy between generation and repair elevates the correctness and sufficiency of the produced test cases significantly beyond previous methods. In comparative experiments, TestART demonstrates an 18% improvement in pass rate and a 20% enhancement in coverage across three types of datasets compared to baseline models. Additionally, it achieves better coverage rates than EvoSuite with only half the number of test cases. These results demonstrate TestART's superior ability to produce high-quality unit test cases by harnessing the power of LLMs while overcoming their inherent flaws.
Software Engineering
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on improving the quality of unit test generation based on large - language models (LLMs). Specifically, the paper aims to solve the following three key problems: 1. **Compilation and runtime errors**: Due to the hallucination phenomenon of LLMs (i.e., generating content that does not conform to the actual code logic), the generated test cases have compilation and runtime errors. 2. **Lack of test and coverage feedback information**: There is no effective test and coverage feedback mechanism, which limits the improvement of code coverage. 3. **Repetition suppression problem**: The repair and generation attempts generated by LLMs are prone to fall into an ineffective cycle, resulting in low - quality generated test cases. To solve these problems, the authors propose the TestART method to improve unit test generation based on LLMs through the co - evolution of automated generation and repair iterations. The main innovations of TestART include: - **Template repair strategy**: For the first time, use a template - based repair strategy to effectively correct errors in test cases generated by LLMs. - **Coverage - guided test feedback**: Extract coverage information from successful test cases and use it as coverage - guided test feedback. - **Positive prompt injection**: Introduce positive prompt injection technology to prevent repetition suppression and enhance the sufficiency of the final test cases. Through these improvements, TestART significantly improves the correctness and coverage of test cases. Experimental results show that, compared with the baseline model, TestART achieves an 18% increase in the pass rate and a 20% increase in coverage on multiple datasets. In addition, TestART can achieve higher coverage than EvoSuite with only half the number of test cases. In summary, this paper proposes a new method to improve the quality of unit test generation based on LLMs through co - evolution of automatic generation and repair iterations, solving several key problems in existing methods.