Abstract:Tools capable of automatic code generation have the potential to augment programmer's capabilities. While straightforward code retrieval is incorporated into many IDEs, an emerging area is explicit code generation. Code generation is currently approached as a Machine Translation task, with Recurrent Neural Network (RNN) based encoder-decoder architectures trained on code-description pairs. In this work we introduce and study modern Transformer architectures for this task. We further propose a new model called the Relevance Transformer that incorporates external knowledge using pseudo-relevance feedback. The Relevance Transformer biases the decoding process to be similar to existing retrieved code while enforcing diversity. We perform experiments on multiple standard benchmark datasets for code generation including Django, Hearthstone, and CoNaLa. The results show improvements over state-of-the-art methods based on BLEU evaluation. The Relevance Transformer model shows the potential of Transformer-based architectures for code generation and introduces a method of incorporating pseudo-relevance feedback during inference.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are two key challenges in code generation: improving the accuracy and diversity of the generated code. Specifically, the author focuses on how to use external knowledge (such as existing code snippets) to improve the task of generating code based on natural - language descriptions. ### Problem Background During the programming process, programmers often need to query and refer to a large amount of programming languages, libraries, and technologies, which causes them to frequently search for example code or syntax instructions online. This practice not only prolongs the development process but also reduces productivity. Although existing code retrieval tools can help programmers find relevant code, they are usually not flexible enough to adapt to different context requirements. ### Limitations of Existing Methods Most of the current code generation methods are based on the neural machine translation (NMT) system and use the encoder - decoder architecture in the recurrent neural network (RNN). Although these methods perform well on certain tasks, they have the following problems: 1. **Ineffective combination of external knowledge**: Existing models have difficulty integrating external knowledge (such as existing code snippets) into the generation process. 2. **Lack of diversity and accuracy in generation results**: The generated code may lack diversity and is prone to errors in complex situations. ### Solutions Proposed in the Paper To solve the above problems, the author introduced a new model - Relevance Transformer. The main innovations of this model include: 1. **Pseudo - Relevance Feedback**: By retrieving code snippets related to the input description and adjusting the decoding process according to the common words in these snippets, the relevance and accuracy of the generated code are improved. 2. **Copy Mechanism**: Allows the model to directly copy specific words (such as variable names and method identifiers) from the input to deal with the problem of rare words encountered in the generation process. ### Experimental Verification The author conducted experiments on multiple standard benchmark datasets (such as Django, Hearthstone, and CoNaLa), and the results show that Relevance Transformer outperforms the existing state - of - the - art methods in terms of BLEU score, especially on the CoNaLa dataset. ### Summary By introducing pseudo - relevance feedback and copy mechanism, Relevance Transformer can combine external knowledge more effectively and generate more accurate and diverse code snippets. This method not only improves the quality of code generation but also provides new ideas for generation tasks in other fields. ### Formula Summary The formulas involved in the paper mainly include probability distribution and interpolation calculation: 1. **Interpolated probability distribution**: \[ P(y_t|x, y_{<t}) = [\lambda \cdot P_{\text{NMT}}(x, y_{<t}) + (1-\lambda) \cdot P_{\text{retrieval}}(x, y_t) \cdot P_{\text{context}}(y_{<t}, y_t)] \cdot Z \] where \( P_{\text{NMT}} \) is the original neural machine translation distribution, \( P_{\text{retrieval}} \) is the retrieval - based result distribution, \( P_{\text{context}} \) is the context repetition penalty term, and \( Z \) is the normalization constant. 2. **Weighted score of retrieval results**: \[ P_{\text{retrieval}}(x, y_t) = \left[1 - \mathbb{I}_{V_f}(y_t)\right] \cdot \sum_{d \in R(x, K)} P_{\text{score}}(y_t, d) \cdot P_{\text{BM25}}(x, d) \] where \( \mathbb{I}_{V_f} \) is the indicator function, \( R(x, K) \) is the retrieved top \( K \) documents, \( P_{

Relevance Transformer: Generating Concise Code Snippets with Relevance Feedback

A Comparative Study on Code Generation with Transformers

PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation

Fine-grained Pseudo-code Generation Method via Code Feature Extraction and Transformer

StructCoder: Structure-Aware Transformer for Code Generation

TreeGen: A Tree-Based Transformer Architecture for Code Generation

A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities

TransformCode: A Contrastive Learning Framework for Code Embedding Via Subtree Transformation

Compilable Neural Code Generation with Compiler Feedback

A new approach for encoding code and assisting code understanding

Retrieve and Refine: Exemplar-based Neural Comment Generation

Leveraging pre-trained language models for code generation

The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning Tasks

SeTransformer: A Transformer-Based Code Semantic Parser for Code Comment Generation

Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code

A Simple Retrieval-based Method for Code Comment Generation

Design of an efficient Transformer-XL model for enhanced pseudo code to Python code conversion

Preference-Guided Refactored Tuning for Retrieval Augmented Code Generation

Code comment generation based on graph neural network enhanced transformer model for code understanding in open-source software ecosystems

Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization

Empirical Study of Transformers for Source Code