Improving Code Summarization with Tree Transformer Enhanced by Position-related Syntax Complement

Jie Song,Zexin Zhang,Zirui Tang,Shi Feng,Yu Gu
DOI: https://doi.org/10.1109/tai.2024.3395231
2024-01-01
Abstract:Code summarization aims to generate natural language summaries automatically given the source code snippet, which aids developers in understanding source code faster and improves software maintenance. Recent approaches using natural language techniques in code summarization fall short of adequately capturing the syntactic characteristics of programming languages, particularly the position-related syntax, from which the semantics of the source code can be extracted. In this paper, we present SyMer (Syntax transforMer) based on the Transformer architecture where we enhance it with position-related syntax complement to better capture syntactic characteristics. Position-related syntax complement takes advantage of unambiguous relations among code tokens in AST, as well as the gathered attention on crucial code tokens indicated by its syntactic structure. The experimental results demonstrate that SyMer outperforms state-of-the-art models by at least 2.4% (BLEU), 1.0% (METEOR) on Java benchmark and 4.8% (BLEU), 5.1% (METEOR) and 3.2% (ROUGE-L) on Python benchmark.
What problem does this paper attempt to address?