Abstract:Large language models (LLMs) have already revolutionized code generation, after being pretrained on publicly available code data. However, while various methods have been proposed to augment LLMs with retrieved knowledge and enhance the quality of code generation, the performance of these retrieval-based methods is limited by the strength of the retrievers used. In addition, while LLMs show great emergent ability, they still struggle to produce the correct code in one turn. To address these challenges, we propose a novel two-step pipeline, called \autoknow, that leverages LLMs as both knowledge providers and self-reflective programmers. Unlike retrieval-based methods, \autoknow~obtains the knowledge from input prompts and generates intermediate code based on the generated knowledge. After that, \autoknow~asks LLM to act as an expert programmer to perform debugging for the generated code. This is achieved by receiving the error message from the interpreter, without requiring special test cases for correctness verification. We evaluate \autoknow~on three code generation datasets, including DS-1000 for data science code, HumanEval for software engineering code, and TransCoder for C++-to-Python translation. Our empirical experiments show that \autoknow~outperforms strong baselines by a significant margin on all datasets. We also conduct exhaustive analytical experiments to validate the effectiveness of the two stages of \autoknow, and find that both are superior to other prompting-based methods. Further scalability analysis demonstrates that \autoknow~can be adapted to other more advanced models, such as GPT-4, and bring consistent efficacy improvement.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the performance limitations of existing code - generation methods, especially when using large - language models (LLMs) for code generation. Although LLMs have significantly improved the quality of code generation through pre - training on publicly available code data, the ability of these models to generate correct code in one attempt is still limited. In addition, although retrieval - based methods can enhance the knowledge - acquisition ability of LLMs, their performance is limited by the strength of the retriever, and there may be domain - mismatch problems when adapting to different tasks. To overcome these problems, the paper proposes a new framework named SELFEVOLVE. SELFEVOLVE improves the code - generation process through two stages: 1. **Knowledge Generation**: In this stage, SELFEVOLVE uses LLMs as knowledge providers to automatically generate necessary intermediate code according to the input prompt. Different from traditional retrieval - based methods, SELFEVOLVE does not need to rely on external large - scale knowledge bases or specialized retrieval tools to acquire knowledge, but directly extracts knowledge from LLMs. 2. **Code Debugging**: After generating the preliminary code, SELFEVOLVE further lets LLMs play the role of expert programmers to debug the generated code. This process is achieved by receiving error messages from the interpreter without preparing special test cases for correctness verification. This method not only maintains the correctness of test cases but is also closer to the real programming scenario because it modifies the code based on actual error feedback. SELFEVOLVE is designed to improve the accuracy of code generation, reduce the possibility of generating incorrect code, and perform well in different code - generation tasks. The paper evaluates the effect of SELFEVOLVE through experiments on three code - generation datasets, including data - science code generation, general - purpose code generation, and C++ to Python code - translation tasks. The experimental results show that SELFEVOLVE significantly outperforms existing strong baseline methods on all datasets, proving its effectiveness and generalization ability.

SelfEvolve: A Code Evolution Framework via Large Language Models

A Self-Iteration Code Generation Method Based on Large Language Models

An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation

Code Generation Using Self-Interactive Assistant

Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework

Large Language Models as Code Executors: An Exploratory Study

AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation

Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

On the Effectiveness of Large Language Models in Domain-Specific Code Generation

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Knowledge-Aware Code Generation with Large Language Models

CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models

Supervised Knowledge Makes Large Language Models Better In-context Learners

StepCoder: Improving Code Generation with Reinforcement Learning from Compiler Feedback

Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond

Evolving Knowledge Distillation with Large Language Models and Active Learning

Self-planning Code Generation with Large Language Models

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Multilingual Code Co-Evolution Using Large Language Models

JumpCoder: Go Beyond Autoregressive Coder via Online Modification