CoEdPilot: Recommending Code Edits with Learned Prior Edit Relevance, Project-wise Awareness, and Interactive Nature

Chenyan Liu,Yufan Cai,Yun Lin,Yuhuan Huang,Yunrui Pei,Bo Jiang,Ping Yang,Jin Song Dong,Hong Mei
DOI: https://doi.org/10.1145/3650212.3652142
2024-08-03
Abstract:Recent years have seen the development of LLM-based code generation. Compared to generating code in a software project, incremental code edits are empirically observed to be more frequent. The emerging code editing approaches usually formulate the problem as generating an edit based on known relevant prior edits and context. However, practical code edits can be more complicated. First, an editing session can include multiple (ir)relevant edits to the code under edit. Second, the inference of the subsequent edits is non-trivial as the scope of its ripple effect can be the whole project. In this work, we propose CoEdPilot, an LLM-driven solution to recommend code edits by discriminating the relevant edits, exploring their interactive natures, and estimating its ripple effect in the project. Specifically, CoEdPilot orchestrates multiple neural transformers to identify what and how to edit in the project regarding both edit location and edit content. When a user accomplishes an edit with an optional editing description, a Subsequent Edit Analysis first reports the most relevant files in the project with what types of edits (e.g., keep, insert, and replace) can happen for each line of their code. Next, an Edit-content Generator generates concrete edit options for the lines of code, regarding its relevant prior changes reported by an Edit-dependency Analyzer. Lastly, both the Subsequent Edit Analysis and the Edit-content Generator capture relevant prior edits as feedback to readjust their recommendations. We train our models by collecting over 180K commits from 471 open-source projects in 5 programming languages. Our extensive experiments show that CoEdPilot can well predict the edits (i.e., predicting edit location with an accuracy of 70.8%-85.3%, and the edit content with an exact match rate of 41.8% and BLEU4 score of 60.7)...
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in code - editing tasks, existing methods have some limitations when dealing with actual code editing. Specifically: 1. **Relevance assumption of prior edits**: Existing work usually assumes that all previous edits are relevant to the target edit. However, in practice, this is not always the case. Irrelevant prior edits may introduce noise and affect the accuracy of the generated edits. 2. **Availability assumption of subsequent edit locations**: Knowing where edits can occur is not straightforward, because the impact of a prior edit may spread throughout the project. 3. **Interactive nature among multiple edits**: There may be syntactic dependencies and semantic associations among code edits, but existing Transformer models lack the design to capture such interactions. To solve these problems, the authors propose CoEdPilot, a language - model - (LM - ) based solution aimed at improving code - editing recommendations in the following ways: - **Predict relevant prior edits**: Identify the previous edits that are most relevant to the current edit. - **Predict the location of subsequent edits**: Determine the files and lines of code where edits may occur. - **Capture the interactive nature among edits**: Consider the syntactic, semantic, and logical propagation among edits. ### Specific problem description The actual problems mentioned in the paper include: - **Multi - edit situations in an edit session**: An edit session may contain multiple related or unrelated edits. - **Inferential complexity of subsequent edits**: The scope of influence of subsequent edits may be the entire project, so inferring their influence is non - trivial. ### Solution overview CoEdPilot solves the above problems through the following steps: 1. **Two - stage edit location prediction**: - **First stage**: Use the Edit - propagating File Locator to scan the entire project and report, at a coarse - grained level, the set of files \( F \) where changes may occur. - **Second stage**: Apply the Edit - propagating Line Locator to these files to predict, at a fine - grained level, the type of edit (such as keep, insert, replace) for each line of code. 2. **Edit content generation**: Generate specific edit content using the Edit - content Generator based on the predicted edit locations and relevant prior edits. 3. **Edit - dependency analysis**: Train the Edit - dependency Analyzer to select the most relevant prior edits with the potential for syntactic, semantic, and logical propagation. 4. **Interactive edit recommendation**: Once a user confirms a recommended edit option, it will trigger a new round of edit recommendations as a new prior edit. ### Experimental results The paper verifies the effectiveness of CoEdPilot through extensive experiments. The main conclusions are as follows: - CoEdPilot can predict edit locations with an accuracy of 70.8% - 85.3% and achieve a 41.8% exact - match rate and a 60.7% BLEU4 score on edit content. - Compared with existing edit generators (such as GRACE and CCT5), CoEdPilot can significantly improve the exact - match rate and BLEU4 score of edit generation. In general, by introducing the learning of prior - edit relevance, project - level awareness, and the interactive nature among edits, CoEdPilot makes code - editing recommendations more practical and efficient.