COLARE: Commit Classification Via Fine-grained Context-aware Representation of Code Changes

Qunhong Zeng,Yuxia Zhang,Zeyu Sun,Yujie Guo,Hui Liu
DOI: https://doi.org/10.1109/saner60148.2024.00082
2024-01-01
Abstract:Commit classification for maintenance activities is of critical importance for both industry and academia. State-of-the-art approaches either treat code changes as plain text or rely on manually identified features. Directly applying the most advanced model of code change representation into commit classification faces two limitations: (1) coarse-grained diff comparison neglects the distance of modified code lines; (2) missing key context information of hunk modification and file categories. This study proposes a novel classification model, COLARE, which compares code changes at the hunk level, takes fine-grained features based on categories of changed files, and aggregates with the representation of commit messages. The evaluation results show that our model outperforms state-of-the-art techniques by 7.24% and 7.35% in accuracy and macro F1 score, respectively. We also manually labeled a multi-language dataset and evaluated our approach, The results further confirm that our approach achieves the best performance over three baselines, including ChatGPT (3.5). The evaluation of the ablation study demonstrates the effectiveness of the major components in our technique.
What problem does this paper attempt to address?