Abstract:Large Language Models for Code (LLMs4Code) have been found to exhibit outstanding performance in the software engineering domain, especially the remarkable performance in coding tasks. However, even the most advanced LLMs4Code can inevitably contain incorrect or outdated code knowledge. Due to the high cost of training LLMs4Code, it is impractical to re-train the models for fixing these problematic code knowledge. Model editing is a new technical field for effectively and efficiently correcting erroneous knowledge in LLMs, where various model editing techniques and benchmarks have been proposed recently. Despite that, a comprehensive study that thoroughly compares and analyzes the performance of the state-of-the-art model editing techniques for adapting the knowledge within LLMs4Code across various code-related tasks is notably absent. To bridge this gap, we perform the first systematic study on applying state-of-the-art model editing approaches to repair the inaccuracy of LLMs4Code. To that end, we introduce a benchmark named CLMEEval, which consists of two datasets, i.e., CoNaLa-Edit (CNLE) with 21K+ code generation samples and CodeSearchNet-Edit (CSNE) with 16K+ code summarization samples. With the help of CLMEEval, we evaluate six advanced model editing techniques on three LLMs4Code: CodeLlama (7B), CodeQwen1.5 (7B), and Stable-Code (3B). Our findings include that the external memorization-based GRACE approach achieves the best knowledge editing effectiveness and specificity (the editing does not influence untargeted knowledge), while generalization (whether the editing can generalize to other semantically-identical inputs) is a universal challenge for existing techniques. Furthermore, building on in-depth case analysis, we introduce an enhanced version of GRACE called A-GRACE, which incorporates contrastive learning to better capture the semantics of the inputs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to effectively and efficiently correct the inaccurate or out - of - date knowledge of large - language models in code - generation tasks, especially without retraining the entire model. Although the state - of - the - art large - language models perform well in the field of software engineering, especially in code - generation tasks, they may contain incorrect or out - of - date code knowledge. Since retraining these models is costly and time - consuming, a new technique - Model Editing - is required to directly correct specific errors in the model without affecting other non - target knowledge. However, there is currently a lack of comprehensive research on the performance of existing model - editing techniques in code - generation and code - summarization tasks. For this purpose, the paper constructs a benchmark (CLMEEval), including two datasets, to evaluate the performance of six advanced model - editing techniques on three widely - used code - generation large - language models. Through this research, the paper aims to fill this gap in the field and provide an in - depth analysis of the effectiveness, generalization ability and specificity of different model - editing techniques in updating the knowledge of code - generation models.

Model Editing for LLMs4Code: How Far are We?

Editing Large Language Models: Problems, Methods, and Opportunities

Robust and Scalable Model Editing for Large Language Models

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

A Comprehensive Study of Knowledge Editing for Large Language Models

Learning to Edit: Aligning LLMs with Knowledge Editing

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Model Editing Can Hurt General Abilities of Large Language Models

On the Robustness of Editing Large Language Models

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

GrACE: Generation using Associated Code Edits

Keys to Robust Edits: from Theoretical Insights to Practical Advances

Exploring the Capabilities of LLMs for Code Change Related Tasks

Knowledge Graph Enhanced Large Language Model Editing

Uncovering Overfitting in Large Language Model Editing

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

Knowledge Editing for Large Language Models: A Survey

Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning