A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation

Avinash Anand,Akshit Gupta,Nishchay Yadav,Shaurya Bajaj
2024-11-12
Abstract:Bug fixing and code generation have been core research topics in software development for many years. The recent explosive growth in Large Language Models has completely transformed these spaces, putting in reach incredibly powerful tools for both. In this survey, 27 recent papers have been reviewed and split into two groups: one dedicated to Automated Program Repair (APR) and LLM integration and the other to code generation using LLMs. The first group consists of new methods for bug detection and repair, which include locating semantic errors, security vulnerabilities, and runtime failure bugs. The place of LLMs in reducing manual debugging efforts is emphasized in this work by APR toward context-aware fixes, with innovations that boost accuracy and efficiency in automatic debugging. The second group dwells on code generation, providing an overview of both general-purpose LLMs fine-tuned for programming and task-specific models. It also presents methods to improve code generation, such as identifier-aware training, fine-tuning at the instruction level, and incorporating semantic code structures. This survey work contrasts the methodologies in APR and code generation to identify trends such as using LLMs, feedback loops to enable iterative code improvement and open-source models. It also discusses the challenges of achieving functional correctness and security and outlines future directions for research in LLM-based software development.
Artificial Intelligence
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address two core issues in software development: Automated Program Repair (APR) and code generation. Specifically: 1. **Automated Program Repair (APR)**: - **Problem Identification**: How to use Large Language Models (LLMs) to detect and fix semantic errors, security vulnerabilities, and runtime errors in programs. - **Challenges**: Existing APR tools still face challenges in terms of accuracy, reliability, context sensitivity, and resource overhead. For example, APR tools sometimes mistakenly identify correct code as erroneous and vice versa; when dealing with large, complex codebases, the tools may fail to understand all dependencies, leading to incorrect fixes; additionally, APR tools have high memory and computational resource demands, which may affect users' workflows and productivity. 2. **Code Generation**: - **Problem Identification**: How to use LLMs to generate high-quality code, including generating code based on natural language requests, fixing errors in existing code, and understanding large, complex code repositories. - **Challenges**: Although LLMs perform well in code generation tasks, there are still some challenges, such as insufficient generalization ability, limited understanding of domain-specific code, security issues, biases in training data, and overfitting to benchmark test sets. ### Main Objectives of the Paper 1. **Collect and Summarize Research**: Compile recent research achievements in the use of LLMs in the fields of APR and code generation, summarizing the goals that have been achieved. 2. **Clarify Application Scenarios**: Detail the scenarios and programming languages in which these tools can be used for repairs. 3. **Methods of Integrating LLMs**: Explore how LLMs can be integrated into the workflows of code repair and generation, and the challenges faced in this process. 4. **Analyze Limitations**: Discuss the limitations of using LLMs for code-related tasks and point out issues that are still under research. Through these objectives, the paper aims to provide researchers and developers with a comprehensive perspective, helping them better understand and apply these advanced technologies.