RTCoder: an Approach Based on Retrieve-template for Automatic Code Generation

Tianyou Chang,Shizhan Chen,Guodong Fan,Zhiyong Feng
DOI: https://doi.org/10.1109/icpads60453.2023.00071
2023-01-01
Abstract:Regarding code generation, researchers have recently proposed a retrieve-template-generation approach. This method involves retrieving similar code snippets through a retriever and providing them to a generator along with input descriptions. However, since the retrieved similar code can be influenced by various data types, it may lead the model to reference unrelated content, resulting in some discrepancies between the generated code and the target code. To mitigate this bias, we introduce a code generation method based on retrieve-template-generation called RTCoder. Specifically, RTCoder completes code generation through three steps: Retrieve. Using a natural language description, the retriever retrieves several similar code snippets from a corpus. Template. By comparing these similar code snippets, it employs the Rabin-Karp algorithm to extract their common substrings and represents different substrings with spaces, forming a code template. Generator. The generator, based on a specific natural language description and the corresponding code template, automatically generates the concrete target code. We conducted extensive comparative experiments on three datasets and used three widely used evaluation metrics. The experimental results demonstrate that: (1) Compared to mainstream code generation models, RTCoder shows improvements in all three metrics across different datasets. For instance, compared to the state-of-the-art CodeT5 base, the EM value is 5.98%, 3.34%, and 1.67% higher on the three datasets, respectively. (2) Our approach is effective for other models as well. Taking the CodeBLEU score on the Concode dataset as an example, the retrieval-template-based generation method improved by 3.73% and 1.80% compared to direct generation and retrieval-generation methods on the RNN model, respectively.
What problem does this paper attempt to address?