Abstract:Automatic code annotation generation aims to generate readable annotations that describe the functionality of source code, which may facilitate software developers and programmers. Previous methods follow the encoder-decoder structures where the encoders are based on the abstract syntax trees (ASTs) to encode syntactic structures of code fragments. However, the AST alone cannot fully express complicated control structures, data flows, or dependencies of source code, leading to sub-optimal annotations. On the other hand, a functionality can be implemented in various ways with possibly different structures and token names. Most methods treat code fragments independently and do not exploit these similarities among code fragments. In this paper, we present HANCode2Seq, an automatic code annotation generation method by utilizing the code heterogeneous representation graph. Specifically, we construct the heterogeneous graph by combining multiple code induced graphs, including abstract syntax trees, control flow graphs, data flow graphs, and program dependency graphs. Then a heterogeneous graph attention network is applied to extract the comprehensive semantic meanings and syntactic structures of the source code fragments. Furthermore, we present a novel adaptive code similarity graph with code fragments being nodes. The representation of a code fragment is enhanced by aggregating information from other similar fragments on the graph, which may reduce the ambiguity of the code. The experimental results on real datasets show that our proposed model outperforms other baselines and produces more fluent and readable code annotations.

Code2Text: Dual Attention Syntax Annotation Networks for Structure-Aware Code Translation

Automatic Code Annotation Generation Based on Heterogeneous Graph Structure

Assessing the Effectiveness of Syntactic Structure to Learn Code Edit Representations

CodeAttention: Translating Source Code to Comments by Exploiting the Code Constructs

StructCoder: Structure-Aware Transformer for Code Generation

From Code to Natural Language: Type-Aware Sketch-Based Seq2Seq Learning

PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation

Automatic Code Annotation Generation Based on Multi-dimensional Heterogeneous Graph Structure

AST-Trans: Code Summarization with Efficient Tree-Structured Attention

Code Attention: Translating Code to Comments by Exploiting Domain Features

code2seq: Generating Sequences from Structured Representations of Code

CSA-Trans: Code Structure Aware Transformer for AST

AST-trans

Code Search based on Context-aware Code Translation

Text2PyCode: Machine Translation of Natural Language Intent to Python Source Code

Tree-to-tree Neural Networks for Program Translation

Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Syntax-Aware Retrieval Augmented Code Generation

Structurally-Enhanced Approach for Automatic Code Transformation