Abstract:Retrieval Augmented Generation (RAG) has shown notable advancements in software engineering tasks. Despite its potential, RAG's application in unit test generation remains under-explored. To bridge this gap, we take the initiative to investigate the efficacy of RAG-based LLMs in test generation. As RAGs can leverage various knowledge sources to enhance their performance, we also explore the impact of different sources of RAGs' knowledge bases on unit test generation to provide insights into their practical benefits and limitations. Specifically, we examine RAG built upon three types of domain knowledge: 1) API documentation, 2) GitHub issues, and 3) StackOverflow Q&As. Each source offers essential knowledge for creating tests from different perspectives, i.e., API documentations provide official API usage guidelines, GitHub issues offer resolutions of issues related to the APIs from the library developers, and StackOverflow Q&As present community-driven solutions and best practices. For our experiment, we focus on five widely used and typical Python-based machine learning (ML) projects, i.e., TensorFlow, PyTorch, Scikit-learn, Google JAX, and XGBoost to build, train, and deploy complex neural networks efficiently. We conducted experiments using the top 10% most widely used APIs across these projects, involving a total of 188 APIs. We investigate the effectiveness of four state-of-the-art LLMs (open and closed-sourced), i.e., GPT-3.5-Turbo, GPT-4o, Mistral MoE 8x22B, and Llamma 3.1 405B. Additionally, we compare three prompting strategies in generating unit test cases for the experimental APIs, i.e., zero-shot, a Basic RAG, and an API-level RAG on the three external sources. Finally, we compare the cost of different sources of knowledge used for the RAG.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the effectiveness of using Retrieval - Augmented Generation (RAG) technology in unit test generation and its performance under different knowledge sources. Specifically, the authors focus on the following points: 1. **Exploration of the application of RAG in unit test generation**: Although RAG has shown significant progress in software engineering tasks, its application in the field of unit test generation is still in the exploratory stage. The authors hope to fill this gap through this research and evaluate the effect of RAG in unit test generation. 2. **The influence of different knowledge sources on RAG**: The authors explore the influence of different types of external knowledge sources (such as API documents, GitHub issues, StackOverflow Q&A) on the performance of RAG, in order to provide insights into its actual benefits and limitations. 3. **Improving unit test coverage**: By using RAG, the authors hope to increase the line coverage of the generated unit tests, thereby improving the quality and effectiveness of the tests. 4. **Cost - benefit analysis**: In addition to evaluating the technical advantages of RAG, the authors also focus on the cost - benefit of different RAG configurations, especially the cost impact of different knowledge sources and prompting strategies on generating test cases. 5. **Manual analysis**: In order to gain a deeper understanding of the actual effect of RAG in unit test generation, the authors also conduct a manual analysis to evaluate the specific impact of different strategies on the software under test. In summary, the main objective of this paper is to evaluate the effectiveness and potential of RAG in unit test generation, explore the influence of different knowledge sources on the performance of RAG, and ultimately provide valuable references for future related research.

Retrieval-Augmented Test Generation: How Far Are We?

Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems

LLM-based Unit Test Generation via Property Retrieval

Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

Retrieval-Augmented Generation for Large Language Models: A Survey

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Deploying Large Language Models With Retrieval Augmented Generation

Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application

Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report

RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards

Retrieval-Augmented Machine Translation with Unstructured Knowledge

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems

Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation

The Chronicles of RAG: The Retriever, the Chunk and the Generator

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation