Abstract:Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies. Besides, the existing benchmarks usually focus on limited code completion scenarios, which cannot reflect the repository-level code completion abilities well of existing methods. To address these limitations, we propose the R2C2-Coder to enhance and benchmark the real-world repository-level code completion abilities of code Large Language Models, where the R2C2-Coder includes a code prompt construction method R2C2-Enhance and a well-designed benchmark R2C2-Bench. Specifically, first, in R2C2-Enhance, we first construct the candidate retrieval pool and then assemble the completion prompt by retrieving from the retrieval pool for each completion cursor position. Second, based on R2C2 -Enhance, we can construct a more challenging and diverse R2C2-Bench with training, validation and test splits, where a context perturbation strategy is proposed to simulate the real-world repository-level code completion well. Extensive results on multiple benchmarks demonstrate the effectiveness of our R2C2-Coder.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address two main issues that existing code completion models face when dealing with project repository-level code completion: 1. **Insufficient Context Utilization**: Existing repository-level code completion methods often fail to fully utilize the extensive contextual information within a project repository, such as the complexity of related files and class hierarchies. These methods typically focus only on local and discrete information, neglecting the richer and more fundamental context within program semantics. 2. **Limited Benchmarking Scenarios**: Current benchmarks usually cover only limited code completion scenarios, failing to comprehensively reflect the performance of existing methods in repository-level code completion. This leads to an incomplete and unrealistic evaluation of model performance. To address these issues, the authors propose **R2C2-Coder**, which includes a code suggestion construction method **R2C2-Enhance** and a new benchmark **R2C2-Bench**. Specifically: - **R2C2-Enhance**: - **Candidate Retrieval Pool Construction**: Using parser generation tools (e.g., Tree-sitter) to extract abstract context from each file and fine-grained local information from code snippets to construct a candidate retrieval pool. - **Completion Suggestion Construction**: Generating retrieval queries based on the current cursor position, retrieving relevant context from the candidate retrieval pool, and combining it with the context of the current file to generate completion suggestions. - **R2C2-Bench**: - **Dataset Generation**: Collecting licensed repositories from GitHub to generate datasets containing training, validation, and test sets. - **Context Perturbation Strategy**: Introducing context perturbation strategies to simulate real-world repository-level code completion scenarios, enhancing the diversity and challenge of completion suggestions. Through these methods, R2C2-Coder aims to enhance and evaluate the code completion capabilities of large language models at the project repository level in real-world scenarios. Experimental results show that R2C2-Coder demonstrates significant effectiveness and efficiency across multiple benchmarks.

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

RLCoder: Reinforcement Learning for Repository-Level Code Completion

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

ExecRepoBench: Multi-level Executable Code Completion Evaluation

GraphCoder: Enhancing Repository-Level Code Completion Via Coarse-to-fine Retrieval Based on Code Context Graph

Enhancing Repository-Level Code Generation with Integrated Contextual Information

Prompt-based Code Completion via Multi-Retrieval Augmented Generation

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?

RepoMasterEval: Evaluating Code Completion via Real-World Repositories

Repoformer: Selective Retrieval for Repository-Level Code Completion

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

A Lightweight Framework for Adaptive Retrieval In Code Completion With Critique Model

Repository-Level Prompt Generation for Large Language Models of Code

ReACC: A Retrieval-Augmented Code Completion Framework

REPOFUSE: Repository-Level Code Completion with Fused Dual Context