CLG-Trans: Contrastive Learning for Code Summarization Via Graph Attention-Based Transformer

Jianwei Zeng,Yutong He,Tao Zhang,Zhou Xu,Qiang Han
DOI: https://doi.org/10.1016/j.scico.2023.102925
IF: 1.039
2023-01-01
Science of Computer Programming
Abstract:Automated code summarization is the task of automatically generating natural language descriptions of source code, which is an important research topic in the software engineering field. Many methods in recent studies were based on deep learning techniques, which effectively improve the performance of code summarization. Most of the existing code summarization methods use different kinds of neural networks to learn source code information. Some methods use graph neural network (GNN) to represent abstract syntax tree (AST) and fuse the structural information of source code. However, these methods still have two important issues: 1) they cannot solve the Out-Of-Vocabulary (OOV) problem effectively; 2) the structural information of source code they can capture is limited. In order to address the above-mentioned challenges, we propose a novel automated code summarization model named CLG-Trans in this work. This model uses the Byte Pair Encoding (BPE) algorithm and pointer-generator network to tackle the OOV problem. Then it utilizes the fusion of contrastive learning strategy and dynamic graph attention mechanism to effectively capture rich structure information of source code sequences. Experimental results on Funcom dataset show that CLG-Trans outperforms seven state-of-the-art models (i.e., Hybrid-DRL, Ast-Attendgru, Transformer, codeGnn, Rencos, CodeBERT and SIT) by averagely increasing 19.48% and 13.17% on BLEU scores and ROUGUE-L score, respectively. In addition, CLG-Trans achieves an improvement of 16.14% and 4.70% in BLEU scores and ROUGE-L score compared with our previously proposed model DG-Trans.
What problem does this paper attempt to address?