Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

Gabriel Ryan,Siddhartha Jain,Mingyue Shang,Shiqi Wang,Xiaofei Ma,Murali Krishna Ramanathan,Baishakhi Ray
2024-04-03
Abstract:Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
Software Engineering,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problems of quality and coverage in the automatic generation of test cases in software testing. Specifically: 1. **Limitations of traditional test generation methods**: - **Search - Based Software Testing (SBST)**: Although widely used, SBST methods often struggle to achieve high coverage when dealing with complex software units, especially when branch conditions depend on specific values or states. - **Large Language Models (LLM)**: Although LLM shows potential in generating test cases, they can usually only generate simple and common test cases, and the coverage of complex branch conditions is still low. 2. **Specific manifestations of the problem**: - **Low coverage**: Whether it is SBST or LLM, the generated test cases often fail to cover all possible execution paths, especially when dealing with complex logic and external dependencies. - **Generation quality**: Existing methods are insufficient in generating high - quality test cases, especially for branch conditions that require specific input values or states. ### Solutions To overcome the above problems, the paper proposes **SymPrompt**, a code - aware prompting strategy for LLM to generate test cases. The main contributions of SymPrompt are as follows: 1. **Path Constraint Prompting**: - Decompose the test case generation process into multiple stages, and generate test cases for specific execution paths at each stage. - Collect path constraints through static analysis and incorporate them into the prompt to guide LLM to generate test cases that cover specific paths. 2. **Context construction**: - Include the signature of the focus method, type context, dependency context, etc., to provide more abundant information to help LLM generate more accurate test cases. 3. **Multi - stage prompting**: - Generate test cases in an iterative manner. After each generation, the generated test cases are used as part of the subsequent prompt, gradually increasing the test coverage. ### Experimental results - **Benchmark tests**: Evaluated on 897 challenging methods, the results show that SymPrompt significantly improves test coverage. - For the CodeGen2 model, the relative coverage is increased by 26%. - For the GPT - 4 model, the relative coverage is increased by 105%. ### Summary By proposing SymPrompt, the paper solves the deficiencies of existing test generation methods in coverage and generation quality, especially when dealing with complex logic and external dependencies. Through path constraint prompting and multi - stage prompting strategies, SymPrompt can generate more comprehensive test cases and effectively improve the quality and coverage of software testing.