RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening.

Yifan Li,Ensheng Shi,Dewu Zheng,Kefeng Duan,Jiachi Chen,Yanlin Wangt
DOI: https://doi.org/10.1145/3671016.3674819
2024-01-01
Abstract:Repository-level code generation task involves generating code at a specified location based on unfinished code with repository context. Existing research mainly rely on retrieval-augmented generation methods to complete code. Existing work mainly investigates on improving retrieval results based on the unfinished code, but rarely pays attention to the information loss in the prompt encoding process. In this paper, we propose RepoMinCoder, a novel repository-level code generation framework that adds another round of screening and ranking based on information loss, building upon the canonical retrieval-augmented generation method. Extensive experimental results demonstrate that RepoMinCoder consistently outperforms state-of-the-art methods on public benchmark RepoEval, achieving 3.3% EM and 2.1% ES improvement over previous methods. Moreover, we conduct additional experiments to study the effect of various factors in the existing code generation pipeline, including the number of retrieval candidates, the slicing strategy of the retrieval database, and different prompting strategies.
What problem does this paper attempt to address?