A Comparison of Pretrained Models for Classifying Issue Reports

Jueun Heo,Gibeom Kwon,Changwon Kwak,Seonah Lee
DOI: https://doi.org/10.1109/access.2024.3408688
IF: 3.9
2024-06-12
IEEE Access
Abstract:Issues are evolving requirements in software engineering. They are the main factors that increase the cost of software evolution. To help developers manage issues, GitHub provides issue labeling mechanisms in issue management systems. However, manually labeling issue reports still requires considerable developer workload. To ease developers' burden, researchers have proposed automatically classifying issue reports. To improve the classification accuracy, researchers adopted deep learning techniques and pretrained models. However, pretrained models in the general domain such as RoBERTa have limitations in understanding the contexts of software engineering tasks. In this paper, we create a pretrained model, IssueBERT, with issue data to understand whether a domain-specific pretrained model could improve the accuracy of issue report classification. We also adopt and explore several pretrained models in the software engineering domain, namely, CodeBERT, BERTOverflow, and seBERT. We conduct a comparative experiment on these pretrained models to evaluate their performance in classifying issue reports. Our comparison results show that IssueBERT outperforms the other pretrained models. Noticeably, IssueBERT yields an average F1 score that is 1.74% higher than that of seBERT and 3.61% higher than that of RoBERTa, even though IssueBERT was pretrained with much less data than seBERT and RoBERTa.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?