An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation

Thai Tang Quoc,Duc Ha Minh,Tho Quan Thanh,Anh Nguyen-Duc

2024-08-28

Abstract:Large Language Models (LLMs) have recently advanced many applications on software engineering tasks, particularly the potential for code generation. Among contemporary challenges, code generated by LLMs often suffers from inaccuracies and hallucinations, requiring external inputs to correct. One recent strategy to fix these issues is to refine the code generated from LLMs using the input from the model itself (self-augmented). In this work, we proposed a novel method, namely CoT-SelfEvolve. CoT-SelfEvolve iteratively and automatically refines code through a self-correcting process, guided by a chain of thought constructed from real-world programming problem feedback. Focusing on data science code, including Python libraries such as NumPy and Pandas, our evaluations on the DS-1000 dataset demonstrate that CoT-SelfEvolve significantly outperforms existing models in solving complex problems. The framework shows substantial improvements in both initial code generation and subsequent iterations, with the model's accuracy increasing significantly with each additional iteration. This highlights the effectiveness of using chain-of-thought prompting to address complexities revealed by program executor traceback error messages. We also discuss how CoT-SelfEvolve can be integrated into continuous software engineering environments, providing a practical solution for improving LLM-based code generation.

Software Engineering,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inaccuracy and hallucination issues of large - language models (LLMs) when generating data - science code. Specifically, the paper focuses on how to improve the code quality generated by LLMs through a self - correction mechanism. The authors point out that although LLMs perform well in generating high - quality code, the generated code may still contain errors or logical flaws and requires external input for correction. To this end, they propose a new method named CoT - SelfEvolve, which improves the code through an iterative and automated self - correction process, guided by the Chain of Thought (CoT) constructed from the feedback of actual programming problems. In particular, this method focuses on data - science code, including the use of Python libraries such as NumPy and Pandas. The main contribution of the paper lies in the introduction of the CoT - SelfEvolve framework, which makes two key innovations on the basis of the existing SelfEvolve model: CoT prompting and external knowledge base integration. These innovations aim to overcome the limitations of current self - correcting LLM methods and improve the accuracy and efficiency of code generation. Through experimental evaluation, the authors prove that CoT - SelfEvolve is significantly superior to existing models in solving complex problems, and as the number of iterations increases, the accuracy of the model also improves significantly.

An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation

SelfEvolve: A Code Evolution Framework via Large Language Models

A Self-Iteration Code Generation Method Based on Large Language Models

An Empirical Study of Code Generation Errors made by Large Language Models

Self-Edit: Fault-Aware Code Editor for Code Generation

Fixing Code Generation Errors for Large Language Models

A Deep Dive into Large Language Model Code Generation Mistakes: What and Why?

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Teaching Large Language Models to Self-Debug

Can Large Language Models Invent Algorithms to Improve Themselves?

Better Language Models of Code through Self-Improvement

CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset

Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models

ProgCo: Program Helps Self-Correction of Large Language Models

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

Code Optimization Chain-of-Thought: Structured Understanding and Self-Checking

Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework

CYCLE: Learning to Self-Refine the Code Generation

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

Where Do Large Language Models Fail When Generating Code?