Self-Edit: Fault-Aware Code Editor for Code Generation

Kechi Zhang,Zhuo Li,Jia Li,Ge Li,Zhi Jin
2023-09-11
Abstract:Large language models (LLMs) have demonstrated an impressive ability to generate codes on competitive programming tasks. However, with limited sample numbers, LLMs still suffer from poor accuracy. Inspired by the process of human programming, we propose a generate-and-edit approach named Self-Edit that utilizes execution results of the generated code from LLMs to improve the code quality on the competitive programming task. We execute the generated code on the example test case provided in the question and wrap execution results into a supplementary comment. Utilizing this comment as guidance, our fault-aware code editor is employed to correct errors in the generated code. We perform extensive evaluations across two competitive programming datasets with nine different LLMs. Compared to directly generating from LLMs, our approach can improve the average of pass@1 by 89\% on APPS-dev, 31\% on APPS-test, and 48\% on HumanEval over nine popular code generation LLMs with parameter sizes ranging from 110M to 175B. Compared to other post-processing methods, our method demonstrates superior accuracy and efficiency.
Software Engineering,Computation and Language
What problem does this paper attempt to address?
The paper aims to address the low accuracy of large language models (LLMs) in programming competition tasks. Although these models perform well in generating code, they still exhibit a high error rate in practical applications. To solve this problem, the authors propose a new method called Self-Edit, which improves code quality by combining generation and editing. Specifically, the Self-Edit method mimics the process human programmers use to solve problems: 1. **Generation**: Use a large language model to generate initial code based on the problem description. 2. **Execution**: Run the generated code on example test cases and collect the execution results. 3. **Editing**: Utilize a fault-aware neural code editor to correct errors based on the generated code and the supplementary annotations provided by the execution results. Experimental results show that the Self-Edit method significantly improves the quality of code generation compared to directly generating code from LLMs. The method has been validated on multiple datasets and different large language models, demonstrating its effectiveness and generality. Particularly, under limited sampling budget conditions, Self-Edit outperforms existing reranking methods.