Code Generation with Hybrid of Structural and Semantic Features Retrieval

Kang Yang,Huiqun Yu,Guisheng Fan,Zijie Huang,Ziyi Zhou
DOI: https://doi.org/10.1142/s0218194022500267
IF: 1.007
2022-01-01
International Journal of Software Engineering and Knowledge Engineering
Abstract:Due to the growing need for faster software delivery, code generation has attracted more and more attention, since it could improve code maintainability by providing suggestions for coding. In the model of generating program source code from natural language (NL), the most effective method is to generate an intermediate architecture (such as Abstract Syntax Tree) combined with a deep learning model. However, these models have the following drawbacks: (1) The data structural information is underutilized and the correlation between samples is not considered. (2) Lack of the ability to memorize large and complex structures, so that complex codes cannot be generated correctly. To address these issues, we propose HRCODE model, a code generation architecture based on Hybrid of structural and semantic features Retrieval CODE model. We transform the NL description into an intermediate structure with structural features. Then, the NL and the intermediate structure are embedded into a vector through weight mixing, and we calculate the similarity score between each vector to retrieve the most relevant samples. Finally, the new input is brought into the PLBART model to generate code. Experiments show that HRCODE is at least 4.7% higher than the state-of-the-art models in the ACC metric and at least 10.3% higher in the BLEU-4 score. We have released our code at https://github.com/jesokang/HRCODE.
What problem does this paper attempt to address?