Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

Weiqing Yang,Hanbin Wang,Zhenghao Liu,Xinze Li,Yukun Yan,Shuo Wang,Yu Gu,Minghe Yu,Zhiyuan Liu,Ge Yu

2024-08-09

Abstract:Debugging is a vital aspect of software development, yet the debugging capabilities of Large Language Models (LLMs) remain largely unexplored. This paper first introduces DEBUGEVAL, a comprehensive benchmark designed to evaluate the debugging capabilities of LLMs. DEBUGEVAL collects data from existing high-quality datasets and designs four different tasks to evaluate the debugging effectiveness, including BUG Localization, BUG Identification, Code Review, and Code Repair. Additionally, to enhance the code debugging ability of LLMs, this paper proposes a CoMmunicative Agent BaSed DaTa REfinement FRamework (MASTER), which generates the refined code debugging data for supervised finetuning. Specifically, MASTER employs the Code Quizzer to generate refined data according to the defined tasks of DEBUGEVAL. Then the Code Learner acts as a critic and reserves the generated problems that it can not solve. Finally, the Code Teacher provides a detailed Chain-of-Thought based solution to deal with the generated problem. We collect the synthesized data and finetune the Code Learner to enhance the debugging ability and conduct the NeuDebugger model. Our experiments evaluate various LLMs and NeuDebugger in the zero-shot setting on DEBUGEVAL. Experimental results demonstrate that these 7B-scale LLMs have weaker debugging capabilities, even these code-oriented LLMs. On the contrary, these larger models (over 70B) show convincing debugging ability. Our further analyses illustrate that MASTER is an effective method to enhance the code debugging ability by synthesizing data for Supervised Fine-Tuning (SFT) LLMs.

Software Engineering,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address two main issues: 1. **Evaluating the code debugging capabilities of large language models (LLMs)**: - Currently, although large language models perform well in tasks such as code generation and translation, their performance in code debugging has not been fully explored and evaluated. To fill this gap, the authors designed a comprehensive benchmark tool—**DEBUG EVAL**—to assess the code debugging capabilities of large language models. - **DEBUG EVAL** includes four different tasks: BUG Localization, BUG Identification, Code Review, and Code Repair. These tasks aim to comprehensively evaluate the model's ability to identify, classify errors, and provide correct solutions. 2. **Enhancing the code debugging capabilities of large language models**: - To address the issues of data singularity and insufficient complexity in existing code debugging benchmarks, the authors proposed a data refinement framework based on communication agents—**MASTER**. - The **MASTER** framework works through three agents (Code Quizzer, Code Learner, and Code Teacher) to generate high-quality supervised fine-tuning data to enhance the code debugging capabilities of large language models. - Specifically, **Code Quizzer** is responsible for generating diverse code debugging problems, **Code Learner** acts as an evaluator, retaining those problems it cannot solve, and **Code Teacher** provides detailed solutions and explanations. Through these two efforts, the paper not only provides a comprehensive benchmark tool for evaluating the code debugging capabilities of large language models but also proposes an effective method to enhance these models' debugging performance in real development environments.

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

MdEval: Massively Multilingual Code Debugging

DebugBench: Evaluating Debugging Capability of Large Language Models

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step

RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance

Training LLMs to Better Self-Debug and Explain Code

A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Effective Large Language Model Debugging with Best-first Tree Search

HDLdebugger: Streamlining HDL debugging with Large Language Models

Model Editing for LLMs4Code: How Far are We?

Debugging with Open-Source Large Language Models: An Evaluation

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Fine-tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review

Fine-grained LLM Agent: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback

PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing

MEIC: Re-thinking RTL Debug Automation using LLMs

CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

Exploring the Capabilities of LLMs for Code Change Related Tasks

Fixing Code Generation Errors for Large Language Models