CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

Hung Le,Hailin Chen,Amrita Saha,Akash Gokul,Doyen Sahoo,Shafiq Joty
2024-03-14
Abstract:Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.
Artificial Intelligence,Computation and Language,Programming Languages
What problem does this paper attempt to address?
This paper attempts to address the challenges that large - language models (LLMs) encounter when generating code for complex programming tasks. Specifically, current LLMs perform poorly when dealing with complex programming tasks, mainly because they tend to generate monolithic code blocks rather than breaking the task into logical subtasks and sub - modules. This is different from the way experienced programmers solve problems. The latter usually write modular code and reuse the developed modules. To bridge this gap, the authors propose the CodeChain framework. CodeChain guides LLMs to generate modular code through a series of self - revising steps. The following are the main steps of this framework: 1. **Chain - of - Thought Prompting**: - Through chain - of - thought prompting, LLMs are instructed to break the solution into modular parts. Each module represents a high - level logical subtask. - This method encourages LLMs to generate structured code, similar to the way human developers solve problems. 2. **Extract and Cluster Sub - modules**: - Sub - modules are extracted from the generated code and these sub - modules are clustered. - The most representative sub - module in each cluster is selected as a general and reusable implementation. 3. **Enhance the Original Prompt and Regenerate Code**: - The selected sub - modules are added to the original chain - of - thought prompt to guide LLMs to regenerate a new modular solution. - In this way, LLMs can utilize the collective insights of modular components from all previously generated samples, thereby improving future generations. 4. **Iterative Self - Revision**: - By iterating the above process multiple times, the modularity and correctness of the generated code are gradually improved. Experimental results show that CodeChain significantly improves the performance of LLMs on complex programming tasks. For example, in the APPS and CodeContests benchmarks, CodeChain achieves a 35% and 76% relative pass@1 improvement respectively. In addition, CodeChain shows consistent improvement on both OpenAI LLMs and open - source LLMs such as WizardCoder. In conclusion, this paper aims to enhance the code - generation ability of LLMs in complex programming tasks by introducing modular and iterative self - revision methods.