An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation

Thai Tang Quoc,Duc Ha Minh,Tho Quan Thanh,Anh Nguyen-Duc
2024-08-28
Abstract:Large Language Models (LLMs) have recently advanced many applications on software engineering tasks, particularly the potential for code generation. Among contemporary challenges, code generated by LLMs often suffers from inaccuracies and hallucinations, requiring external inputs to correct. One recent strategy to fix these issues is to refine the code generated from LLMs using the input from the model itself (self-augmented). In this work, we proposed a novel method, namely CoT-SelfEvolve. CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process, guided by a chain of thought constructed from real-world programming problem feedback. Focusing on data science code, including Python libraries such as NumPy and Pandas, our evaluations on the DS-1000 dataset demonstrate that CoT-SelfEvolve significantly outperforms existing models in solving complex problems. The framework shows substantial improvements in both initial code generation and subsequent iterations, with the model's accuracy increasing significantly with each additional iteration. This highlights the effectiveness of using chain-of-thought prompting to address complexities revealed by program executor traceback error messages. We also discuss how CoT-SelfEvolve can be integrated into continuous software engineering environments, providing a practical solution for improving LLM-based code generation.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inaccuracy and hallucination issues of large - language models (LLMs) when generating data - science code. Specifically, the paper focuses on how to improve the code quality generated by LLMs through a self - correction mechanism. The authors point out that although LLMs perform well in generating high - quality code, the generated code may still contain errors or logical flaws and requires external input for correction. To this end, they propose a new method named CoT - SelfEvolve, which improves the code through an iterative and automated self - correction process, guided by the Chain of Thought (CoT) constructed from the feedback of actual programming problems. In particular, this method focuses on data - science code, including the use of Python libraries such as NumPy and Pandas. The main contribution of the paper lies in the introduction of the CoT - SelfEvolve framework, which makes two key innovations on the basis of the existing SelfEvolve model: CoT prompting and external knowledge base integration. These innovations aim to overcome the limitations of current self - correcting LLM methods and improve the accuracy and efficiency of code generation. Through experimental evaluation, the authors prove that CoT - SelfEvolve is significantly superior to existing models in solving complex problems, and as the number of iterations increases, the accuracy of the model also improves significantly.