Large Language Models for Software Engineering: Survey and Open Problems

Angela Fan,Beliz Gokkaya,Mark Harman,Mitya Lyubarskiy,Shubho Sengupta,Shin Yoo,Jie M. Zhang
2023-11-12
Abstract:This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requirements, repair, refactoring, performance improvement, documentation and analytics. However, these very same emergent properties also pose significant technical challenges; we need techniques that can reliably weed out incorrect solutions, such as hallucinations. Our survey reveals the pivotal role that hybrid techniques (traditional SE plus LLMs) have to play in the development and deployment of reliable, efficient and effective LLM-based SE.
Software Engineering
What problem does this paper attempt to address?
The problem this paper attempts to address is the application of large language models (LLMs) in software engineering (SE) and the open research challenges they face. Specifically, the paper focuses on the following aspects: 1. **Review of LLMs in Software Engineering**: The paper provides a comprehensive review of the latest developments, progress, and empirical results of LLMs in the field of software engineering in recent years, covering various aspects such as code generation, design, requirements analysis, bug fixing, refactoring, performance optimization, documentation generation, and data analysis. 2. **Identification of Open Research Issues**: Based on the shortcomings of existing literature and technological opportunities, the paper identifies open issues and challenges that the software engineering research community needs to address. These issues include how to reliably exclude erroneous solutions (such as hallucination problems), how to combine traditional software engineering methods with LLMs to develop reliable LLM-based software engineering tools, etc. 3. **Technical Challenges**: While the emerging features of LLMs bring innovation and creativity, they also introduce a series of technical challenges, particularly in ensuring the correctness and reliability of the generated content. For example, LLMs may generate seemingly reasonable but actually incorrect code or documentation, which is a serious issue in software engineering. 4. **Automated Testing and Verification**: The paper emphasizes the important role of automated testing techniques in ensuring the correctness of LLM-generated content, especially when generating new features and systems. It discusses the problems and solutions related to automated test data generation and automated test oracle generation. 5. **Input and Output Optimization**: The paper also discusses the importance of input prompt engineering and output interpretation optimization, which can improve the performance and reliability of LLMs in software engineering tasks. In summary, this paper aims to provide the software engineering research community with a comprehensive perspective on the current state and future research directions of LLM applications through review and analysis, and proposes a series of urgent technical challenges to be addressed.