Abstract:Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challenges due to the lack of labeled data for training. Therefore, we propose RLCoder, a novel reinforcement learning framework, which can enable the retriever to learn to retrieve useful content for code completion without the need for labeled data. Specifically, we iteratively evaluate the usefulness of retrieved content based on the perplexity of the target code when provided with the retrieved content as additional context, and provide feedback to update the retriever parameters. This iterative process enables the retriever to learn from its successes and failures, gradually improving its ability to retrieve relevant and high-quality content. Considering that not all situations require information beyond code files and not all retrieved context is helpful for generation, we also introduce a stop signal mechanism, allowing the retriever to decide when to retrieve and which candidates to retain autonomously. Extensive experimental results demonstrate that RLCoder consistently outperforms state-of-the-art methods on CrossCodeEval and RepoEval, achieving 12.2% EM improvement over previous methods. Moreover, experiments show that our framework can generalize across different programming languages and further improve previous methods like RepoCoder. We provide the code and data at <a class="link-external link-https" href="https://github.com/DeepSoftwareAnalytics/RLCoder" rel="external noopener nofollow">this https URL</a>.

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation

ContextModule: Improving Code Completion via Repository-level Contextual Information

ExecRepoBench: Multi-level Executable Code Completion Evaluation

Repoformer: Selective Retrieval for Repository-Level Code Completion

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

Combining Code Embedding with Static Analysis for Function-Call Completion

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

GraphCoder: Enhancing Repository-Level Code Completion Via Coarse-to-fine Retrieval Based on Code Context Graph

Enhancing Repository-Level Code Generation with Integrated Contextual Information

A Review of Repository Level Prompting for LLMs

RLCoder: Reinforcement Learning for Repository-Level Code Completion

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

AI-powered Code Review with LLMs: Early Results

LLMSA: A Compositional Neuro-Symbolic Approach to Compilation-free and Customizable Static Analysis

ReACC: A Retrieval-Augmented Code Completion Framework

RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation