CoverUp: Coverage-Guided LLM-Based Test Generation

Juan Altmayer Pizzorno,Emery D. Berger
2024-09-13
Abstract:Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains a challenge. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp iteratively improves test coverage, interleaving coverage analysis with dialogs with the LLM that steer it to refine tests so that they increase coverage of lines and branches. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects, and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation/LLM-based test generator, CoverUp achieves an overall line+branch coverage of 90% (vs. 77%). We show that CoverUp's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly 40% of its successes.
Software Engineering,Artificial Intelligence,Machine Learning,Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in software development, automatically generating regression test cases with high coverage remains a challenging task. Although existing test - generation tools can automate the creation of test cases, the tests generated by these tools often fail to fully cover all paths and branches of the code, resulting in potential errors remaining undetected. ### Specific problem description 1. **High labor intensity of manually writing test cases**: - Developers usually choose not to write tests because the workload of writing test cases is too large, which will affect the software quality assurance. 2. **Limitations of existing test - generation tools**: - Current test - generation tools can automate the generation of test cases, but they perform poorly in generating high - coverage tests, especially when dealing with complex code, it is difficult to cover all execution paths and branches. 3. **Lack of an effective iterative improvement mechanism**: - Existing tools lack an effective mechanism to iteratively improve according to coverage feedback after generating tests, resulting in low - quality test cases. ### Solutions proposed in the paper To solve the above problems, the paper proposes **CoverUp**, a new test - generation method based on large - language models (LLMs) and coverage - guided. The main innovations of CoverUp include: - **Combining coverage analysis with LLMs**: - CoverUp uses detailed coverage information to customize prompts, enabling LLMs to focus on the parts of the code lacking coverage. - **Iterative dialogue mechanism**: - If the tests generated by the LLMs fail to significantly improve the coverage or run - time failures occur, CoverUp will continue to have a dialogue with the LLMs, requesting improvements or error - fixing. In this way, CoverUp can gradually optimize the test cases until a relatively high coverage is achieved. - **Providing context information**: - CoverUp provides a utility function `get_info` that allows LLMs to request additional information about symbols (such as functions, classes, variables, etc.) in code snippets, thereby generating more accurate test cases. ### Experimental results The paper verifies the effectiveness of CoverUp through experiments. Compared with other state - of - the - art test - generation tools (such as CodaMosa and MuTAP), CoverUp shows significant advantages in multiple benchmark tests: - **Module - level line + branch coverage**: - CoverUp achieves a median coverage of 80% (compared with 47% of CodaMosa). - **Overall line + branch coverage**: - CoverUp achieves a coverage of 90% (compared with 77% of MuTAP). In conclusion, by combining coverage analysis and the powerful capabilities of LLMs, CoverUp successfully addresses the deficiencies of existing test - generation tools in generating high - coverage test cases and significantly improves the effectiveness and coverage of tests.