Abstract:Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains a challenge. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp iteratively improves test coverage, interleaving coverage analysis with dialogs with the LLM that steer it to refine tests so that they increase coverage of lines and branches. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects, and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation/LLM-based test generator, CoverUp achieves an overall line+branch coverage of 90% (vs. 77%). We show that CoverUp's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly 40% of its successes.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in software development, automatically generating regression test cases with high coverage remains a challenging task. Although existing test - generation tools can automate the creation of test cases, the tests generated by these tools often fail to fully cover all paths and branches of the code, resulting in potential errors remaining undetected. ### Specific problem description 1. **High labor intensity of manually writing test cases**: - Developers usually choose not to write tests because the workload of writing test cases is too large, which will affect the software quality assurance. 2. **Limitations of existing test - generation tools**: - Current test - generation tools can automate the generation of test cases, but they perform poorly in generating high - coverage tests, especially when dealing with complex code, it is difficult to cover all execution paths and branches. 3. **Lack of an effective iterative improvement mechanism**: - Existing tools lack an effective mechanism to iteratively improve according to coverage feedback after generating tests, resulting in low - quality test cases. ### Solutions proposed in the paper To solve the above problems, the paper proposes **CoverUp**, a new test - generation method based on large - language models (LLMs) and coverage - guided. The main innovations of CoverUp include: - **Combining coverage analysis with LLMs**: - CoverUp uses detailed coverage information to customize prompts, enabling LLMs to focus on the parts of the code lacking coverage. - **Iterative dialogue mechanism**: - If the tests generated by the LLMs fail to significantly improve the coverage or run - time failures occur, CoverUp will continue to have a dialogue with the LLMs, requesting improvements or error - fixing. In this way, CoverUp can gradually optimize the test cases until a relatively high coverage is achieved. - **Providing context information**: - CoverUp provides a utility function `get_info` that allows LLMs to request additional information about symbols (such as functions, classes, variables, etc.) in code snippets, thereby generating more accurate test cases. ### Experimental results The paper verifies the effectiveness of CoverUp through experiments. Compared with other state - of - the - art test - generation tools (such as CodaMosa and MuTAP), CoverUp shows significant advantages in multiple benchmark tests: - **Module - level line + branch coverage**: - CoverUp achieves a median coverage of 80% (compared with 47% of CodaMosa). - **Overall line + branch coverage**: - CoverUp achieves a coverage of 90% (compared with 77% of MuTAP). In conclusion, by combining coverage analysis and the powerful capabilities of LLMs, CoverUp successfully addresses the deficiencies of existing test - generation tools in generating high - coverage test cases and significantly improves the effectiveness and coverage of tests.

CoverUp: Coverage-Guided LLM-Based Test Generation

Effective code coverage in compositional systematic dynamic testing

Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis

Guided test generation for coverage criteria

Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing

HITS: High-coverage LLM-based Unit Test Generation via Method Slicing

TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Improving Defect Detection Ability of Derived Test Cases Based on Mutated UML Activity Diagrams

LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

Covering All the Bases: Type-Based Verification of Test Input Generators

Optimizing Search-Based Unit Test Generation with Large Language Models: an Empirical Study

Synthesizing Method Sequences for High-Coverage Testing

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

Search-Based Software Test Data Generation for Path Coverage Based on a Feedback-Directed Mechanism

Predicting Code Coverage without Execution

LLM-Powered Test Case Generation for Detecting Tricky Bugs

Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

LLM-Based Code Generation Method for Golang Compiler Testing

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM