Abstract:Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

Code Translation with Compiler Representations

Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

Evaluating In-Context Learning of Libraries for Code Generation

Multilingual Code Co-Evolution Using Large Language Models

Code-mixed LLM: Improve Large Language Models' Capability to Handle Code-Mixing through Reinforcement Learning from AI Feedback

UniCoder: Scaling Code Large Language Model via Universal Code

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code

If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

Large Language Models for cross-language code clone detection

Multi-Programming Language Ensemble for Code Generation in Large Language Model

Unseen Horizons: Unveiling the Real Capability of LLM Code Generation Beyond the Familiar

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

Improving Natural Language Capability of Code Large Language Model

Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation

AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction