Integrating Non-Fourier and AST-Structural Relative Position Representations Into Transformer-Based Model for Source Code Summarization

Hsiang-Mei Liang,Chin-Yu Huang
DOI: https://doi.org/10.1109/access.2024.3354390
IF: 3.9
2024-01-01
IEEE Access
Abstract:Source code summaries play a crucial role in helping programmers comprehend the behavior of source code functions. In recent deep-learning based approaches for Source Code Summarization, there has been a growing focus on Transformer-based models. These models use self-attention mechanisms to overcome the long-range dependency issue that previous models often encounter, making them a promising solution for the Source Code Summarization task. However, these models suffer from two shortcomings: 1) they are weak in handling the semantics of keywords, and 2) they are weak to learn the source code with complex structure. To resolve these shortcomings, our study proposes integrating Non-Fourier and ASTStructural relative position representations into Transformer-based model for Source Code Summarization, which we have named NFASRPR-TRANS. NFASRPR-TRANS employs two types of positional encoding schemes in two different Transformer encoders. The first encoder handles the semantics of the keywords of the input source code sequence by using the Gaussian Embedder to encode the non-Fourier relative position representation of the sequence. The second encoder uses Tree Positional Encoding to learn the structural information of the Abstract Syntax Trees (ASTs), which provides relative position information in the ASTs for generating the source code summaries. Finally,we compared NFASRPR-TRANS with previous models and evaluated its performance on the Java and Python datasets using five metrics, including BLEU, ROUGE-L, CIDEr, METEOR, and SPICE. NFASRPR-TRANS achieves 2%-10% improvements across all five metrics on both datasets.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?