SelfEvolve: A Code Evolution Framework via Large Language Models

Shuyang Jiang,Yuhao Wang,Yu Wang
2023-06-05
Abstract:Large language models (LLMs) have already revolutionized code generation, after being pretrained on publicly available code data. However, while various methods have been proposed to augment LLMs with retrieved knowledge and enhance the quality of code generation, the performance of these retrieval-based methods is limited by the strength of the retrievers used. In addition, while LLMs show great emergent ability, they still struggle to produce the correct code in one turn. To address these challenges, we propose a novel two-step pipeline, called \autoknow, that leverages LLMs as both knowledge providers and self-reflective programmers. Unlike retrieval-based methods, \autoknow~obtains the knowledge from input prompts and generates intermediate code based on the generated knowledge. After that, \autoknow~asks LLM to act as an expert programmer to perform debugging for the generated code. This is achieved by receiving the error message from the interpreter, without requiring special test cases for correctness verification. We evaluate \autoknow~on three code generation datasets, including DS-1000 for data science code, HumanEval for software engineering code, and TransCoder for C++-to-Python translation. Our empirical experiments show that \autoknow~outperforms strong baselines by a significant margin on all datasets. We also conduct exhaustive analytical experiments to validate the effectiveness of the two stages of \autoknow, and find that both are superior to other prompting-based methods. Further scalability analysis demonstrates that \autoknow~can be adapted to other more advanced models, such as GPT-4, and bring consistent efficacy improvement.
Computation and Language,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance limitations of existing code - generation methods, especially when using large - language models (LLMs) for code generation. Although LLMs have significantly improved the quality of code generation through pre - training on publicly available code data, the ability of these models to generate correct code in one attempt is still limited. In addition, although retrieval - based methods can enhance the knowledge - acquisition ability of LLMs, their performance is limited by the strength of the retriever, and there may be domain - mismatch problems when adapting to different tasks. To overcome these problems, the paper proposes a new framework named SELFEVOLVE. SELFEVOLVE improves the code - generation process through two stages: 1. **Knowledge Generation**: In this stage, SELFEVOLVE uses LLMs as knowledge providers to automatically generate necessary intermediate code according to the input prompt. Different from traditional retrieval - based methods, SELFEVOLVE does not need to rely on external large - scale knowledge bases or specialized retrieval tools to acquire knowledge, but directly extracts knowledge from LLMs. 2. **Code Debugging**: After generating the preliminary code, SELFEVOLVE further lets LLMs play the role of expert programmers to debug the generated code. This process is achieved by receiving error messages from the interpreter without preparing special test cases for correctness verification. This method not only maintains the correctness of test cases but is also closer to the real programming scenario because it modifies the code based on actual error feedback. SELFEVOLVE is designed to improve the accuracy of code generation, reduce the possibility of generating incorrect code, and perform well in different code - generation tasks. The paper evaluates the effect of SELFEVOLVE through experiments on three code - generation datasets, including data - science code generation, general - purpose code generation, and C++ to Python code - translation tasks. The experimental results show that SELFEVOLVE significantly outperforms existing strong baseline methods on all datasets, proving its effectiveness and generalization ability.