Abstract:Large language models (LLMs) have achieved impressive performance in code generation recently, offering programmers revolutionary assistance in software development. However, due to the auto-regressive nature of LLMs, they are susceptible to error accumulation during code generation. Once an error is produced, LLMs can merely continue to generate the subsequent code conditioned on it, given their inability to adjust previous outputs. Existing LLM-based approaches typically consider post-revising after code generation, leading to the challenging resolution of accumulated errors and the significant wastage of resources. Ideally, LLMs should rollback and resolve the occurred error in time during code generation, rather than proceed on the basis of the error and wait for post-revising after generation. In this paper, we propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation. Specifically, we employ program analysis to perform incremental error detection during the generation process. When an error is detected, the backtracking mechanism is triggered to priming rollback strategies and constraint regeneration, thereby eliminating the error early and ensuring continued generation on the correct basis. Experiments on multiple code generation benchmarks show that ROCODE can significantly reduce the errors generated by LLMs, with a compilation pass rate of 99.1%. The test pass rate is improved by up to 23.8% compared to the best baseline approach. Compared to the post-revising baseline, the token cost is reduced by 19.3%. Moreover, our approach is model-agnostic and achieves consistent improvements across nine representative LLMs.

What problem does this paper attempt to address?

The problem this paper attempts to address is the error accumulation issue caused by the autoregressive nature of large language models (LLMs) during code generation. Once an error occurs in the generation process, existing LLMs can only continue generating subsequent code based on these errors and lack the ability to adjust previous outputs. This leads to the accumulation and amplification of errors, causing the generated content to deviate completely from the expected path. Additionally, existing LLM-based methods typically revise the code post-generation, which not only struggles to correct accumulated errors but also wastes significant resources. To solve these problems, the paper proposes the ROCODE method, which integrates backtracking mechanisms and program analysis techniques to detect errors in real-time during the code generation process, backtrack to the error occurrence point, and regenerate the code, thereby preventing error accumulation and improving the quality and efficiency of code generation. Specifically, ROCODE achieves this through the following steps: 1. **Incremental Error Detection**: Continuously check the generated code during the generation process to promptly identify potential errors. 2. **Strategic Backtracking**: When an error is detected, backtrack to an earlier error-free state to determine the backtracking point. 3. **Constrained Regeneration**: Apply constraints during the regeneration process to prevent the recurrence of previous errors. Through these mechanisms, ROCODE can promptly detect and correct errors during the code generation process, significantly reducing the error rate in the generated code, improving the compilation pass rate and test pass rate, while also reducing generation costs. Experimental results show that ROCODE performs excellently in multiple code generation benchmarks, achieving a compilation pass rate of 99.1%, improving the test pass rate by 23.8% compared to the best baseline method, and reducing generation costs by 19.3%.

ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Fixing Code Generation Errors for Large Language Models

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Planning-Driven Programming: A Large Language Model Programming Workflow

The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation

RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance

Where Do Large Language Models Fail When Generating Code?

StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback

A Survey on Evaluating Large Language Models in Code Generation Tasks

Self-planning Code Generation with Large Language Models

A Self-Iteration Code Generation Method Based on Large Language Models

Evaluating Large Language Models in Class-Level Code Generation

Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

Effi-Code: Unleashing Code Efficiency in Language Models

ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models

Multi-Programming Language Ensemble for Code Generation in Large Language Model