ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models

Hojae Han,Jaejin Kim,Jaeseok Yoo,Youngwon Lee,Seung-won Hwang
2024-08-02
Abstract:This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can either express requirements verbosely or may even omit some of them. We introduce ARCHCODE, a novel framework that leverages in-context learning to organize requirements observed in descriptions and to extrapolate unexpressed requirements from them. ARCHCODE generates requirements from given descriptions, conditioning them to produce code snippets and test cases. Each test case is tailored to one of the requirements, allowing for the ranking of code snippets based on the compliance of their execution results with the requirements. Public benchmarks show that ARCHCODE enhances to satisfy functional requirements, significantly improving Pass@k scores. Furthermore, we introduce HumanEval-NFR, the first evaluation of LLMs' non-functional requirements in code generation, demonstrating ARCHCODE's superiority over baseline methods. The implementation of ARCHCODE and the HumanEval-NFR benchmark are both publicly accessible.
Software Engineering,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that existing large - language models (LLMs) fail to fully consider software requirements during the code - generation process. Specifically, existing LLMs mainly focus on generating functionally correct code from text descriptions, but these models often overlook non - functional requirements (NFRs), such as time/space performance, robustness, maintainability, and reliability. In addition, the text descriptions may be too long or even omit some key requirements, resulting in the generated code failing to meet the expected functional and non - functional requirements. To solve these problems, the paper proposes the ARCHCODE framework. ARCHCODE automatically extracts and infers software requirements from text descriptions by leveraging in - context learning (ICL) and generates code and test cases that meet these requirements. The main contributions of ARCHCODE include: 1. **Proposing the ARCHCODE framework**: This framework utilizes ICL to extract software requirements from text descriptions and guides LLMs to generate code and test cases that meet these requirements. 2. **Performance improvement**: When combined with GPT - 3.5 - Turbo, ARCHCODE significantly outperforms GPT - 4 in the HumanEval and CodeContests benchmarks, increasing the Pass@1 scores by 4.81% and 10.45% respectively. 3. **Introducing the HumanEval - NFR benchmark**: This is the first code - generation benchmark that simultaneously evaluates functional requirements (FRs) and non - functional requirements (NFRs), validating the effectiveness of ARCHCODE in meeting NFRs. Through these improvements, ARCHCODE not only improves the functional correctness of the code but also ensures the efficiency and reliability of the code in practical applications.