R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Ken Deng,Jiaheng Liu,He Zhu,Congnan Liu,Jingxin Li,Jiakai Wang,Peng Zhao,Chenchen Zhang,Yanan Wu,Xueqiao Yin,Yuanxing Zhang,Wenbo Su,Bangyu Xiang,Tiezheng Ge,Bo Zheng
2024-06-04
Abstract:Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existing repository-level code completion methods often fall short of fully using the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies. Besides, the existing benchmarks usually focus on limited code completion scenarios, which cannot reflect the repository-level code completion abilities well of existing methods. To address these limitations, we propose the R2C2-Coder to enhance and benchmark the real-world repository-level code completion abilities of code Large Language Models, where the R2C2-Coder includes a code prompt construction method R2C2-Enhance and a well-designed benchmark R2C2-Bench. Specifically, first, in R2C2-Enhance, we first construct the candidate retrieval pool and then assemble the completion prompt by retrieving from the retrieval pool for each completion cursor position. Second, based on R2C2 -Enhance, we can construct a more challenging and diverse R2C2-Bench with training, validation and test splits, where a context perturbation strategy is proposed to simulate the real-world repository-level code completion well. Extensive results on multiple benchmarks demonstrate the effectiveness of our R2C2-Coder.
Computation and Language,Software Engineering
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address two main issues that existing code completion models face when dealing with project repository-level code completion: 1. **Insufficient Context Utilization**: Existing repository-level code completion methods often fail to fully utilize the extensive contextual information within a project repository, such as the complexity of related files and class hierarchies. These methods typically focus only on local and discrete information, neglecting the richer and more fundamental context within program semantics. 2. **Limited Benchmarking Scenarios**: Current benchmarks usually cover only limited code completion scenarios, failing to comprehensively reflect the performance of existing methods in repository-level code completion. This leads to an incomplete and unrealistic evaluation of model performance. To address these issues, the authors propose **R2C2-Coder**, which includes a code suggestion construction method **R2C2-Enhance** and a new benchmark **R2C2-Bench**. Specifically: - **R2C2-Enhance**: - **Candidate Retrieval Pool Construction**: Using parser generation tools (e.g., Tree-sitter) to extract abstract context from each file and fine-grained local information from code snippets to construct a candidate retrieval pool. - **Completion Suggestion Construction**: Generating retrieval queries based on the current cursor position, retrieving relevant context from the candidate retrieval pool, and combining it with the context of the current file to generate completion suggestions. - **R2C2-Bench**: - **Dataset Generation**: Collecting licensed repositories from GitHub to generate datasets containing training, validation, and test sets. - **Context Perturbation Strategy**: Introducing context perturbation strategies to simulate real-world repository-level code completion scenarios, enhancing the diversity and challenge of completion suggestions. Through these methods, R2C2-Coder aims to enhance and evaluate the code completion capabilities of large language models at the project repository level in real-world scenarios. Experimental results show that R2C2-Coder demonstrates significant effectiveness and efficiency across multiple benchmarks.