Abstract:Recent advancements in code-fluent Large Language Models (LLMs) enabled the research on repository-level code editing. In such tasks, the model navigates and modifies the entire codebase of a project according to request. Hence, such tasks require efficient context retrieval, i.e., navigating vast codebases to gather relevant context. Despite the recognized importance of context retrieval, existing studies tend to approach repository-level coding tasks in an end-to-end manner, rendering the impact of individual components within these complicated systems unclear. In this work, we decouple the task of context retrieval from the other components of the repository-level code editing pipelines. We lay the groundwork to define the strengths and weaknesses of this component and the role that reasoning plays in it by conducting experiments that focus solely on context retrieval. We conclude that while the reasoning helps to improve the precision of the gathered context, it still lacks the ability to identify its sufficiency. We also outline the ultimate role of the specialized tools in the process of context gathering. The code supplementing this paper is available at <a class="link-external link-https" href="https://github.com/JetBrains-Research/ai-agents-code-editing" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in warehouse - level code - editing tasks, how to effectively perform context retrieval. Specifically, the author focuses on how to accurately find code snippets related to the task when dealing with large - scale codebases, thereby improving the accuracy and efficiency of code editing. ### Problem Background In recent years, with the development of large - language models (LLMs), researchers have begun to explore warehouse - level code - editing tasks. Such tasks require the model to be able to navigate and modify the codebase of the entire project according to user needs. Therefore, **context retrieval** has become a key challenge because it involves the ability to find relevant code snippets from a large codebase. Although existing research has recognized the importance of context retrieval, most research tends to handle these tasks in an end - to - end manner, which makes it difficult to evaluate the specific contributions of each component. In addition, existing methods often rely on the final task performance when evaluating the effect of context retrieval, ignoring the quality of context retrieval itself. ### Core Problems of the Paper To gain a deeper understanding of the role of context retrieval, this paper separates context retrieval from other components and focuses on studying its independent performance. Specifically, the paper aims to answer the following questions: 1. **The influence of reasoning ability on context retrieval**: Can reasoning improve the accuracy of context retrieval? Can it judge whether the collected context is sufficient? 2. **The role of special - purpose tools**: Can the use of code - structure - aware tools significantly improve the effect of context retrieval? 3. **The relationship between context length and recall rate**: Is there a direct relationship between the length of context retrieval and the recall rate? ### Experimental Design To answer these questions, the author designed a series of experiments, using different context - retrieval strategies, and evaluated the effects of these strategies through Precision, Recall, and F1 - score. The datasets used in the experiments include SWE - bench Lite and LCA Code Editing, and these two datasets cover code - editing tasks with different levels of complexity. ### Main Findings 1. **Reasoning ability improves accuracy**: The enhancement of reasoning ability significantly improves the accuracy of context retrieval, especially at the file level and entity level. 2. **Context length affects recall rate**: There is a positive correlation between the length of context retrieval and the recall rate, that is, a longer context usually improves the recall rate. 3. **Special - purpose tools greatly improve performance**: The strategy of using code - structure - aware tools performs well on all indicators, especially when combined with reasoning ability, the effect is more significant. ### Conclusion The main conclusion drawn by this paper is that reasoning ability plays a crucial role in improving the accuracy of context retrieval, while context length mainly affects the recall rate. In addition, special - purpose tools are crucial for context retrieval. Future research should further explore how to better combine reasoning ability and special - purpose tools to improve the overall performance of code - editing tasks. ### Future Research Directions The author points out that the future focus should be on developing more effective reasoning methods to evaluate whether the collected context is sufficient to solve the problem. In addition, studying Agent - Computer Interfaces (ACI), that is, how to design the interaction between LLM and the external environment to maximize the reasoning potential, will also be an important research direction.

On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing

When Do Program-of-Thought Works for Reasoning?

Retrieval Meets Reasoning: Dynamic In-Context Editing for Long-Text Understanding

ContextModule: Improving Code Completion via Repository-level Contextual Information

RLCoder: Reinforcement Learning for Repository-Level Code Completion

RepoFusion: Training Code Models to Understand Your Repository

GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model

GraphCoder: Enhancing Repository-Level Code Completion Via Coarse-to-fine Retrieval Based on Code Context Graph

Repoformer: Selective Retrieval for Repository-Level Code Completion

On the Impacts of Contexts on Repository-Level Code Generation

ReACC: A Retrieval-Augmented Code Completion Framework

Reasoning Makes Good Annotators : an Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction

Reasoning Runtime Behavior of a Program with LLM: How Far Are We?

When Do Program-of-Thoughts Work for Reasoning?

RepoQA: Evaluating Long Context Code Understanding

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

A Review of Repository Level Prompting for LLMs

Reasoning Graph Enhanced Exemplars Retrieval for In-Context Learning

Re-Reading Improves Reasoning in Large Language Models

Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities