Abstract:Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.

What problem does this paper attempt to address?

This paper attempts to address the challenges that large - language models (LLMs) encounter when generating code for complex programming tasks. Specifically, current LLMs perform poorly when dealing with complex programming tasks, mainly because they tend to generate monolithic code blocks rather than breaking the task into logical subtasks and sub - modules. This is different from the way experienced programmers solve problems. The latter usually write modular code and reuse the developed modules. To bridge this gap, the authors propose the CodeChain framework. CodeChain guides LLMs to generate modular code through a series of self - revising steps. The following are the main steps of this framework: 1. **Chain - of - Thought Prompting**: - Through chain - of - thought prompting, LLMs are instructed to break the solution into modular parts. Each module represents a high - level logical subtask. - This method encourages LLMs to generate structured code, similar to the way human developers solve problems. 2. **Extract and Cluster Sub - modules**: - Sub - modules are extracted from the generated code and these sub - modules are clustered. - The most representative sub - module in each cluster is selected as a general and reusable implementation. 3. **Enhance the Original Prompt and Regenerate Code**: - The selected sub - modules are added to the original chain - of - thought prompt to guide LLMs to regenerate a new modular solution. - In this way, LLMs can utilize the collective insights of modular components from all previously generated samples, thereby improving future generations. 4. **Iterative Self - Revision**: - By iterating the above process multiple times, the modularity and correctness of the generated code are gradually improved. Experimental results show that CodeChain significantly improves the performance of LLMs on complex programming tasks. For example, in the APPS and CodeContests benchmarks, CodeChain achieves a 35% and 76% relative pass@1 improvement respectively. In addition, CodeChain shows consistent improvement on both OpenAI LLMs and open - source LLMs such as WizardCoder. In conclusion, this paper aims to enhance the code - generation ability of LLMs in complex programming tasks by introducing modular and iterative self - revision methods.

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

Code Optimization Chain-of-Thought: Structured Understanding and Self-Checking

Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models

UniCoder: Scaling Code Large Language Model via Universal Code

Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation

CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback

Structured Chain-of-Thought Prompting for Code Generation

A Pair Programming Framework for Code Generation Via Multi-Plan Exploration and Feedback-Driven Refinement

Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

CoLadder: Supporting Programmers with Hierarchical Code Generation in Multi-Level Abstraction

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

Improving Natural Language Capability of Code Large Language Model

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning