FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation
Jinhao Dong,Yiling Lou,Qihao Zhu,Zeyu Sun,Zhilin Li,Wenjie Zhang,Dan Hao
DOI: https://doi.org/10.1145/3510003.3510069
2022-01-01
Abstract:Commit messages summarize code changes of each commit in natural language, which help developers understand code changes without digging into detailed implementations and play an essential role in comprehending software evolution. To alleviate human efforts in writing commit messages, researchers have proposed various automated techniques to generate commit messages, including template-based, information retrieval-based, and learning-based techniques. Although promising, previous techniques have limited effectiveness due to their coarse-grained code change representations. This work proposes a novel commit message generation technique, FIRA, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically. Different from previous techniques, FIRA represents the code changes with fine-grained graphs, which explicitly describe the code edit operations between the old version and the new version, and code tokens at different granularities (i.e., sub-tokens and integral tokens). Based on the graph-based representation, FIRA generates commit messages by a generation model, which includes a graph-neural-network-based encoder and a transformer-based decoder. To make both sub-tokens and integral tokens as available ingredients for commit message generation, the decoder is further incorporated with a novel dual copy mechanism. We further perform an extensive study to evaluate the effectiveness of FIRA. Our quantitative results show that FIRA outperforms state-of-the-art techniques in terms of BLEU, ROUGE-L, and METEOR; and our ablation analysis further shows that major components in our technique both positively contribute to the effectiveness of FIRA. In addition, we further perform a human study to evaluate the quality of generated commit messages from the perspective of developers, and the results consistently show the effectiveness of FIRA over the compared techniques.