Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction
Manar Abu Talib,Ali Bou Nassif,Mohammad Azzeh,Yaser Alesh,Yaman Afadar
DOI: https://doi.org/10.1007/s00521-024-09930-5
2024-06-04
Neural Computing and Applications
Abstract:Software engineering workflows use version control systems to track changes and handle merge cases from multiple contributors. This has introduced challenges to testing because it is impractical to test whole codebases to ensure each change is defect-free, and it is not enough to test changed files alone. Just-in-time software defect prediction (JIT-SDP) systems have been proposed to solve this by predicting the likelihood that a code change is defective. Numerous techniques have been studied to build such JIT software defect prediction models, but the power of pre-trained code transformer language models in this task has been underexplored. These models have achieved human-level performance in code understanding and software engineering tasks. Inspired by that, we modeled the problem of change defect prediction as a text classification task utilizing these pre-trained models. We have investigated this idea on a recently published dataset, ApacheJIT, consisting of 44k commits. We concatenated the changed lines in each commit as one string and augmented it with the commit message and static code metrics. Parameter-efficient fine-tuning was performed for 4 chosen pre-trained models, JavaBERT, CodeBERT, CodeT5, and CodeReviewer, with either partially frozen layers or low-rank adaptation (LoRA). Additionally, experiments with the Local, Sparse, and Global (LSG) attention variants were conducted to handle long commits efficiently, which reduces memory consumption. As far as the authors are aware, this is the first investigation into the abilities of pre-trained code models to detect defective changes in the ApacheJIT dataset. Our results show that proper fine-tuning improves the defect prediction performance of the chosen models in the F 1 scores. CodeBERT and CodeReviewer achieved a 10% and 12% increase in the F 1 score over the best baseline models, JITGNN and JITLine, when commit messages and code metrics are included. Our approach sheds more light on the abilities of language models in software engineering tasks, promoting their use in production environments and ensuring that deployed software is defect-free efficiently.
computer science, artificial intelligence