RRGcode: Deep hierarchical search-based code generation

Qianwen Gou,Yunwei Dong,Yujiao Wu,Qiao Ke
DOI: https://doi.org/10.1016/j.jss.2024.111982
IF: 3.5
2024-01-30
Journal of Systems and Software
Abstract:Retrieval-augmented code generation strengthens the generation model by using a retrieval model to select relevant code snippets from a code corpus. The synergy between retrieval and generation ensures that the generated code closely corresponds to the intended functionality. Existing methods simply feed the retrieved results to the generation model. However, if the retrieval corpus contains erroneous or sub-optimal code examples, there is a risk that the model may replicate these mistakes in the generated code. To tackle these problems, we propose RRGcode( R etrieval, R e-ranking, and G eneration for code generation), a deep hierarchical search-based code generation framework that fine-tunes initial retrieved code rankings, reducing the risk of replicating errors from the retrieval corpus and enhancing the generation of higher-quality, more reliable code. Specifically, it first retrieves relevant code candidates from a large code corpus. Then, a re-ranking model reconstructs the search space through a detailed semantic comparison between code candidates and the query, ensuring that only the most relevant and accurate candidates are considered. Finally, the re-ranked top-K codes, along with the query, serve as input for the code generation model. Extensive experiments are conducted to evaluate the effectiveness of generated code by RRGcode, demonstrating state-of-the-art performance in code generation tasks.
computer science, theory & methods, software engineering
What problem does this paper attempt to address?