Abstract:Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.

CodeCoT and Beyond: Learning to Program and Test Like a Developer.

CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Design of Chain-of-Thought in Math Problem Solving

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling

Chain-of-Thought in Neural Code Generation: From and For Lightweight Language Models

COCO: Testing Code Generation Systems via Concretized Instructions

Structured Chain-of-Thought Prompting for Code Generation

CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset

CodeT: Code Generation with Generated Tests

Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step

Chain of Thoughtlessness? An Analysis of CoT in Planning

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Bridging Code Semantic and LLMs: Semantic Chain-of-Thought Prompting for Code Generation

Code Optimization Chain-of-Thought: Structured Understanding and Self-Checking

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models