TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration

Siqi Gu,Quanjun Zhang,Chunrong Fang,Fangyuan Tian,Liuchuan Zhu,Jianyi Zhou,Zhenyu Chen

2024-12-21

Abstract:Unit testing is crucial for detecting bugs in individual program units but consumes time and effort. Recently, large language models (LLMs) have demonstrated remarkable capabilities in generating unit test cases. However, several problems limit their ability to generate high-quality unit test cases: (1) compilation and runtime errors caused by the hallucination of LLMs; (2) lack of testing and coverage feedback information restricting the increase of code coverage;(3) the repetitive suppression problem causing invalid LLM-based repair and generation attempts. To address these limitations, we propose TestART, a novel unit test generation method. TestART improves LLM-based unit testing via co-evolution of automated generation and repair iteration, representing a significant advancement in automated unit test generation. TestART leverages the template-based repair strategy to effectively fix bugs in LLM-generated test cases for the first time. Meanwhile, TestART extracts coverage information from successful test cases and uses it as coverage-guided testing feedback. It also incorporates positive prompt injection to prevent repetition suppression, thereby enhancing the sufficiency of the final test case. This synergy between generation and repair elevates the correctness and sufficiency of the produced test cases significantly beyond previous methods. In comparative experiments, TestART demonstrates an 18% improvement in pass rate and a 20% enhancement in coverage across three types of datasets compared to baseline models. Additionally, it achieves better coverage rates than EvoSuite with only half the number of test cases. These results demonstrate TestART's superior ability to produce high-quality unit test cases by harnessing the power of LLMs while overcoming their inherent flaws.

Software Engineering

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on improving the quality of unit test generation based on large - language models (LLMs). Specifically, the paper aims to solve the following three key problems: 1. **Compilation and runtime errors**: Due to the hallucination phenomenon of LLMs (i.e., generating content that does not conform to the actual code logic), the generated test cases have compilation and runtime errors. 2. **Lack of test and coverage feedback information**: There is no effective test and coverage feedback mechanism, which limits the improvement of code coverage. 3. **Repetition suppression problem**: The repair and generation attempts generated by LLMs are prone to fall into an ineffective cycle, resulting in low - quality generated test cases. To solve these problems, the authors propose the TestART method to improve unit test generation based on LLMs through the co - evolution of automated generation and repair iterations. The main innovations of TestART include: - **Template repair strategy**: For the first time, use a template - based repair strategy to effectively correct errors in test cases generated by LLMs. - **Coverage - guided test feedback**: Extract coverage information from successful test cases and use it as coverage - guided test feedback. - **Positive prompt injection**: Introduce positive prompt injection technology to prevent repetition suppression and enhance the sufficiency of the final test cases. Through these improvements, TestART significantly improves the correctness and coverage of test cases. Experimental results show that, compared with the baseline model, TestART achieves an 18% increase in the pass rate and a 20% increase in coverage on multiple datasets. In addition, TestART can achieve higher coverage than EvoSuite with only half the number of test cases. In summary, this paper proposes a new method to improve the quality of unit test generation based on LLMs through co - evolution of automatic generation and repair iterations, solving several key problems in existing methods.

TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration

Exploring Automated Assertion Generation Via Large Language Models

Evaluating and Improving ChatGPT for Unit Test Generation

A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites

LLM-Powered Test Case Generation for Detecting Tricky Bugs

No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

Improving Defect Detection Ability of Derived Test Cases Based on Mutated UML Activity Diagrams

Exploring and Lifting the Robustness of LLM-powered Automated Program Repair with Metamorphic Testing

LLM-based Unit Test Generation via Property Retrieval

Effective test generation using pre-trained Large Language Models and mutation testing

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

ChatUniTest: A Framework for LLM-Based Test Generation

Leveraging Large Language Models for Enhancing the Understandability of Generated Unit Tests

Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis

On the Evaluation of Large Language Models in Unit Test Generation

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

MT-ART: A Test Case Generation Method Based on Adaptive Random Testing and Metamorphic Relation

TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation