Abstract:This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.

What problem does this paper attempt to address?

The problem this paper attempts to address is the application of large language models (LLMs) in software engineering (SE) and the open research challenges they face. Specifically, the paper focuses on the following aspects: 1. **Review of LLMs in Software Engineering**: The paper provides a comprehensive review of the latest developments, progress, and empirical results of LLMs in the field of software engineering in recent years, covering various aspects such as code generation, design, requirements analysis, bug fixing, refactoring, performance optimization, documentation generation, and data analysis. 2. **Identification of Open Research Issues**: Based on the shortcomings of existing literature and technological opportunities, the paper identifies open issues and challenges that the software engineering research community needs to address. These issues include how to reliably exclude erroneous solutions (such as hallucination problems), how to combine traditional software engineering methods with LLMs to develop reliable LLM-based software engineering tools, etc. 3. **Technical Challenges**: While the emerging features of LLMs bring innovation and creativity, they also introduce a series of technical challenges, particularly in ensuring the correctness and reliability of the generated content. For example, LLMs may generate seemingly reasonable but actually incorrect code or documentation, which is a serious issue in software engineering. 4. **Automated Testing and Verification**: The paper emphasizes the important role of automated testing techniques in ensuring the correctness of LLM-generated content, especially when generating new features and systems. It discusses the problems and solutions related to automated test data generation and automated test oracle generation. 5. **Input and Output Optimization**: The paper also discusses the importance of input prompt engineering and output interpretation optimization, which can improve the performance and reliability of LLMs in software engineering tasks. In summary, this paper aims to provide the software engineering research community with a comprehensive perspective on the current state and future research directions of LLM applications through review and analysis, and proposes a series of urgent technical challenges to be addressed.

Large Language Models for Software Engineering: Survey and Open Problems

A Survey on Large Language Models for Software Engineering

Large Language Models for Software Engineering: A Systematic Literature Review

The Current Challenges of Software Engineering in the Era of Large Language Models

Large Language Model-Based Agents for Software Engineering: A Survey

Towards an Understanding of Large Language Models in Software Engineering Tasks

A systematic study in intelligent software engineering based on Large Language Model

From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

Software Service Engineering in the Era of Large Language Models

Software Testing with Large Language Models: Survey, Landscape, and Vision

Impact of Large Language Models on Generating Software Specifications

A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks

Breaking the Silence: the Threats of Using LLMs in Software Engineering

Large Language Models for Education: A Survey and Outlook

A Systematic Survey on Large Language Models for Algorithm Design

An Interdisciplinary Outlook on Large Language Models for Scientific Research

Large Language Models and Games: A Survey and Roadmap

On the use of Large Language Models in Model-Driven Engineering

Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering