GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

Wei Liu,Ailun Yu,Daoguang Zan,Bo Shen,Wei Zhang,Haiyan Zhao,Zhi Jin,Qianxiang Wang

2024-09-13

Abstract:The performance of repository-level code completion depends upon the effective leverage of both general and repository-specific knowledge. Despite the impressive capability of code LLMs in general code completion tasks, they often exhibit less satisfactory performance on repository-level completion due to the lack of repository-specific knowledge in these LLMs. To address this problem, we propose GraphCoder, a retrieval-augmented code completion framework that leverages LLMs' general code knowledge and the repository-specific knowledge via a graph-based retrieval-generation process. In particular, GraphCoder captures the context of completion target more accurately through code context graph (CCG) that consists of control-flow, data- and control-dependence between code statements, a more structured way to capture the completion target context than the sequence-based context used in existing retrieval-augmented approaches; based on CCG, GraphCoder further employs a coarse-to-fine retrieval process to locate context-similar code snippets with the completion target from the current repository. Experimental results demonstrate both the effectiveness and efficiency of GraphCoder: Compared to baseline retrieval-augmented methods, GraphCoder achieves higher exact match (EM) on average, with increases of +6.06 in code match and +6.23 in identifier match, while using less time and space.

Software Engineering

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the code completion task at the repository level, large - language models (LLMs) perform poorly due to the lack of repository - specific knowledge. Specifically, although LLMs perform well in general code completion tasks, in repository - level completion tasks, because they cannot learn or access repository - specific knowledge (such as code style and API usage within the library) well, their performance is usually not satisfactory. To solve this problem, the paper proposes GraphCoder, which is a retrieval - enhanced generation framework based on the code context graph (CCG), aiming to utilize the general code knowledge of LLMs and repository - specific knowledge to improve the effectiveness and efficiency of repository - level code completion through the graph retrieval and generation process. The main innovation of GraphCoder lies in that it more accurately captures the context of the completion target by constructing the code context graph (CCG), which is more structured than the existing sequence - based context - capturing methods. In addition, GraphCoder also adopts a coarse - grained to fine - grained retrieval process to find code fragments similar to the context of the completion target from the current repository. Experimental results show that compared with the baseline retrieval - enhanced methods, GraphCoder has a significant improvement in both code exact - match and identifier exact - match, and is also more efficient in terms of time and space consumption.

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

GraphCoder: Enhancing Repository-Level Code Completion Via Coarse-to-fine Retrieval Based on Code Context Graph

ContextModule: Improving Code Completion via Repository-level Contextual Information

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

Prompt-based Code Completion via Multi-Retrieval Augmented Generation

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Repoformer: Selective Retrieval for Repository-Level Code Completion

ReACC: A Retrieval-Augmented Code Completion Framework

Improving AST-Level Code Completion with Graph Retrieval and Multi-Field Attention

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

Enhancing Repository-Level Code Generation with Integrated Contextual Information

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

A Graph Sequence Neural Architecture for Code Completion with Semantic Structure Features

Code Completion by Modeling Flattened Abstract Syntax Trees As Graphs

DroidCoder: Enhanced Android Code Completion with Context-Enriched Retrieval-Augmented Generation

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation