ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models

Hojae Han,Jaejin Kim,Jaeseok Yoo,Youngwon Lee,Seung-won Hwang

2024-08-02

Abstract:This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can either express requirements verbosely or may even omit some of them. We introduce ARCHCODE, a novel framework that leverages in-context learning to organize requirements observed in descriptions and to extrapolate unexpressed requirements from them. ARCHCODE generates requirements from given descriptions, conditioning them to produce code snippets and test cases. Each test case is tailored to one of the requirements, allowing for the ranking of code snippets based on the compliance of their execution results with the requirements. Public benchmarks show that ARCHCODE enhances to satisfy functional requirements, significantly improving Pass@k scores. Furthermore, we introduce HumanEval-NFR, the first evaluation of LLMs' non-functional requirements in code generation, demonstrating ARCHCODE's superiority over baseline methods. The implementation of ARCHCODE and the HumanEval-NFR benchmark are both publicly accessible.

Software Engineering,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that existing large - language models (LLMs) fail to fully consider software requirements during the code - generation process. Specifically, existing LLMs mainly focus on generating functionally correct code from text descriptions, but these models often overlook non - functional requirements (NFRs), such as time/space performance, robustness, maintainability, and reliability. In addition, the text descriptions may be too long or even omit some key requirements, resulting in the generated code failing to meet the expected functional and non - functional requirements. To solve these problems, the paper proposes the ARCHCODE framework. ARCHCODE automatically extracts and infers software requirements from text descriptions by leveraging in - context learning (ICL) and generates code and test cases that meet these requirements. The main contributions of ARCHCODE include: 1. **Proposing the ARCHCODE framework**: This framework utilizes ICL to extract software requirements from text descriptions and guides LLMs to generate code and test cases that meet these requirements. 2. **Performance improvement**: When combined with GPT - 3.5 - Turbo, ARCHCODE significantly outperforms GPT - 4 in the HumanEval and CodeContests benchmarks, increasing the Pass@1 scores by 4.81% and 10.45% respectively. 3. **Introducing the HumanEval - NFR benchmark**: This is the first code - generation benchmark that simultaneously evaluates functional requirements (FRs) and non - functional requirements (NFRs), validating the effectiveness of ARCHCODE in meeting NFRs. Through these improvements, ARCHCODE not only improves the functional correctness of the code but also ensures the efficiency and reliability of the code in practical applications.

ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models

Requirements are All You Need: From Requirements to Code with LLMs

Improving Natural Language Capability of Code Large Language Model

A Survey on Large Language Models for Code Generation

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

LARCH: Large Language Model-based Automatic Readme Creation with Heuristics

A Survey on Evaluating Large Language Models in Code Generation Tasks

Deep Learning Based Code Generation from Requirements Text: Are We There Yet?

Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository

Self-planning Code Generation with Large Language Models

ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation

Code Generation with Hybrid of Structural and Semantic Features Retrieval

Impact of Large Language Models on Generating Software Specifications

ProgAI: Enhancing Code Generation with LLMs For Real World Challenges

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Large Language Models for Code Analysis: Do LLMs Really Do Their Job?

Framework for evaluating code generation ability of large language models

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules