Planning-Driven Programming: A Large Language Model Programming Workflow

Chao Lei,Yanchuan Chang,Nir Lipovetzky,Krista A. Ehinger
2024-11-21
Abstract:The strong performance of large language models (LLMs) on natural language processing tasks raises extensive discussion on their application to code generation. Recent work suggests multiple sampling approaches to improve initial code generation accuracy or program repair approaches to refine the code. However, these methods suffer from LLMs' inefficiencies and limited reasoning capacity. In this work, we propose an LLM programming workflow (LPW) designed to improve both initial code generation and subsequent refinements within a structured two-phase workflow. Specifically, in the solution generation phase, the LLM first outlines a solution plan that decomposes the problem into manageable sub-problems and then verifies the generated solution plan through visible test cases. Subsequently, in the code implementation phase, the LLM initially drafts a code according to the solution plan and its verification. If the generated code fails the visible tests, the plan verification serves as the intended natural language solution to inform the refinement process for correcting bugs. We further introduce SLPW, a sampling variant of LPW, which initially generates multiple solution plans and plan verifications, produces a program for each plan and its verification, and refines each program as necessary until one successfully passes the visible tests. Compared to the state-of-the-art methods across various existing LLMs, our experimental results show that LPW significantly improves the Pass@1 accuracy by up to 16.4% on well-established text-to-code generation benchmarks, especially with a notable improvement of around 10% on challenging benchmarks. Additionally, SLPW demonstrates up to a 5.6% improvement over LPW and sets new state-of-the-art Pass@1 accuracy on various benchmarks, e.g., 98.2% on HumanEval, 84.8% on MBPP, 64.0% on APPS, and 35.3% on CodeContest, using GPT-4o as the backbone.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiencies of large - language models (LLMs) in code - generation tasks, especially their low efficiency and limited reasoning ability. Although existing methods improve the accuracy of initial code generation through multiple sampling or program repair, these methods still have some limitations, such as low sampling efficiency, conflict with human programming strategies, and lack of precise correction guidance in feedback information. In addition, in multi - agent collaborative code generation, ineffective feedback mechanisms will reduce the communication quality, especially when too many agents are involved, which will increase token consumption. To address these problems, the authors propose a large - language - model programming workflow named LPW (Large Language Model Programming Workflow), aiming to improve the initial stage of code generation and the subsequent correction process. LPW designs a structured two - stage workflow: the solution - generation stage and the code - implementation stage. Specifically: 1. **Solution - generation stage**: In this stage, the LLM first formulates a solution plan, decomposes the problem into several manageable sub - problems, and verifies the generated solution plan through visible test cases. 2. **Code - implementation stage**: According to the solution plan and its verification results, the LLM initially writes the code. If the generated code fails to pass the visible test, the verification of the solution plan will serve as a natural - language solution to continuously guide the correction process to correct errors. In addition, the authors also introduce SLPW (Sampling LPW), which is a sampling variant of LPW. It generates multiple solution plans and plan verifications in the solution - generation stage, generates programs for each plan and its verification, and corrects each program when necessary until one of them successfully passes the visible test. The experimental results show that, compared with the existing state - of - the - art methods, LPW significantly improves the Pass@1 accuracy rate by up to 16.4%, especially in challenging benchmark tests. SLPW further improves the performance, with a maximum increase of 5.6% in Pass@1 accuracy rate, and sets new state - of - the - art levels on multiple benchmark tests.