Abstract:The strong performance of large language models (LLMs) on natural language processing tasks raises extensive discussion on their application to code generation. Recent work suggests multiple sampling approaches to improve initial code generation accuracy or program repair approaches to refine the code. However, these methods suffer from LLMs' inefficiencies and limited reasoning capacity. In this work, we propose an LLM programming workflow (LPW) designed to improve both initial code generation and subsequent refinements within a structured two-phase workflow. Specifically, in the solution generation phase, the LLM first outlines a solution plan that decomposes the problem into manageable sub-problems and then verifies the generated solution plan through visible test cases. Subsequently, in the code implementation phase, the LLM initially drafts a code according to the solution plan and its verification. If the generated code fails the visible tests, the plan verification serves as the intended natural language solution to inform the refinement process for correcting bugs. We further introduce SLPW, a sampling variant of LPW, which initially generates multiple solution plans and plan verifications, produces a program for each plan and its verification, and refines each program as necessary until one successfully passes the visible tests. Compared to the state-of-the-art methods across various existing LLMs, our experimental results show that LPW significantly improves the Pass@1 accuracy by up to 16.4% on well-established text-to-code generation benchmarks, especially with a notable improvement of around 10% on challenging benchmarks. Additionally, SLPW demonstrates up to a 5.6% improvement over LPW and sets new state-of-the-art Pass@1 accuracy on various benchmarks, e.g., 98.2% on HumanEval, 84.8% on MBPP, 64.0% on APPS, and 35.3% on CodeContest, using GPT-4o as the backbone.

PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL

PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

Impact of Large Language Models of Code on Fault Localization

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation

Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm

Leveraging Prior Experience: An Expandable Auxiliary Knowledge Base for Text-to-SQL

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement

DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning

Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM

Planning-Driven Programming: A Large Language Model Programming Workflow

Large Language Models of Code Fail at Completing Code with Potential Bugs

Effective Large Language Model Debugging with Best-first Tree Search

Exploring the Potential of Pre-Trained Language Models of Code for Automated Program Repair

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

A Deep Dive into Large Language Models for Automated Bug Localization and Repair

SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights