The Current Challenges of Software Engineering in the Era of Large Language Models

Cuiyun Gao,Xing Hu,Shan Gao,Xin Xia,Zhi Jin
2024-12-19
Abstract:With the advent of large language models (LLMs) in the artificial intelligence (AI) area, the field of software engineering (SE) has also witnessed a paradigm shift. These models, by leveraging the power of deep learning and massive amounts of data, have demonstrated an unprecedented capacity to understand, generate, and operate programming languages. They can assist developers in completing a broad spectrum of software development activities, encompassing software design, automated programming, and maintenance, which potentially reduces huge human efforts. Integrating LLMs within the SE landscape (LLM4SE) has become a burgeoning trend, necessitating exploring this emergent landscape's challenges and opportunities. The paper aims at revisiting the software development life cycle (SDLC) under LLMs, and highlighting challenges and opportunities of the new paradigm. The paper first summarizes the overall process of LLM4SE, and then elaborates on the current challenges based on a through discussion. The discussion was held among more than 20 participants from academia and industry, specializing in fields such as software engineering and artificial intelligence. Specifically, we achieve 26 key challenges from seven aspects, including software requirement & design, coding assistance, testing code generation, code review, code maintenance, software vulnerability management, and data, training, and evaluation. We hope the achieved challenges would benefit future research in the LLM4SE field.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges and opportunities faced by the field of software engineering (SE) in the era of large - language models (LLMs). With the rise of LLMs in the field of artificial intelligence, they have demonstrated an unprecedented ability to understand, generate, and manipulate programming languages, and can assist developers in a wide range of software development activities, such as software design, automated programming, and maintenance. However, integrating LLMs into software engineering (LLM4SE) also brings many new problems. ### Core Problems of the Paper 1. **Re - examining the Software Development Life Cycle (SDLC)**: The paper aims to re - evaluate the software development life cycle under the influence of LLMs and explore the challenges and opportunities brought by this new paradigm. 2. **Summarizing Current Challenges**: Through discussions with more than 20 participants from academia and industry, the paper summarizes 26 key challenges in LLM4SE, covering seven aspects: - Software Requirements and Design - Coding Assistance - Test Code Generation - Code Review - Code Maintenance - Software Vulnerability Management - Data, Training, and Evaluation ### Specific Challenges #### 1. **Software Requirements and Design** - **Requirement/Design Prompts**: LLMs rely on comprehensive prompts and context information to generate effective outputs. Different prompts may lead to significantly different results. How to ensure the effectiveness and accuracy of prompts is a challenge. - **Structured Descriptions**: LLMs have limitations in handling long - context and inductive biases, and it is difficult for them to handle lengthy requirement documents or long - term conversations. - **Lack of Domain Expert Knowledge**: LLMs may lack knowledge in specific domains and need to be compensated by experts or fine - tuned models. - **Evolvability of Software Requirements**: The evolution of requirements is the root of software evolution. How to use LLMs to adapt to and generate changing requirements has not been fully studied. - **Comprehensive Evaluation**: The quality evaluation criteria for automatically generated requirements and designs are not yet perfect. - **Consistency between Modeling and Natural - Language Descriptions**: Although the UML code generated by LLMs is syntactically correct, there are problems in semantic quality and consistency. #### 2. **Coding Assistance** - **Accuracy and Reliability of Code Generation**: The code generated by LLMs may be incorrect or not in line with best practices. - **Code Style and Specifications**: Different development teams have different coding specifications. How to ensure that the generated code complies with these specifications is a challenge. - **Code Explanation and Debugging**: Automatically generated code may be difficult to understand and debug, especially for complex logic. #### 3. **Test Code Generation** - **Test Coverage**: Ensure that the generated test code can cover all possible situations. - **Dynamic Testing**: How to generate test code that can respond to changing requirements and environments. #### 4. **Code Review** - **Code Quality Assessment**: Automatically review the quality and security of code. - **Code Style and Specification Checks**: Ensure that the code complies with the team's coding standards. #### 5. **Code Maintenance** - **Log Analysis and Anomaly Detection**: Extract valuable information from logs and detect abnormal behavior. - **User Feedback Analysis**: Analyze user feedback to improve software. #### 6. **Software Vulnerability Management** - **Vulnerability Detection and Repair**: Use LLMs to improve the efficiency of vulnerability detection and repair. - **Supply - Chain Analysis**: Ensure the security and integrity of software components. #### 7. **Data, Training, and Evaluation** - **Construction of High - Quality Datasets**: Ensure that the datasets used to train LLMs are of high quality and diverse. - **Model Evaluation and Validation**: How to effectively evaluate and validate the performance of LLMs in SE tasks. ### Conclusion By systematically summarizing and analyzing the challenges in LLM4SE, the paper provides directions for future research, aiming to better utilize the advantages of LLMs while overcoming their limitations and promoting the development of the field of software engineering.