Abstract:Software engineering workflows use version control systems to track changes and handle merge cases from multiple contributors. This has introduced challenges to testing because it is impractical to test whole codebases to ensure each change is defect-free, and it is not enough to test changed files alone. Just-in-time software defect prediction (JIT-SDP) systems have been proposed to solve this by predicting the likelihood that a code change is defective. Numerous techniques have been studied to build such JIT software defect prediction models, but the power of pre-trained code transformer language models in this task has been underexplored. These models have achieved human-level performance in code understanding and software engineering tasks. Inspired by that, we modeled the problem of change defect prediction as a text classification task utilizing these pre-trained models. We have investigated this idea on a recently published dataset, ApacheJIT, consisting of 44k commits. We concatenated the changed lines in each commit as one string and augmented it with the commit message and static code metrics. Parameter-efficient fine-tuning was performed for 4 chosen pre-trained models, JavaBERT, CodeBERT, CodeT5, and CodeReviewer, with either partially frozen layers or low-rank adaptation (LoRA). Additionally, experiments with the Local, Sparse, and Global (LSG) attention variants were conducted to handle long commits efficiently, which reduces memory consumption. As far as the authors are aware, this is the first investigation into the abilities of pre-trained code models to detect defective changes in the ApacheJIT dataset. Our results show that proper fine-tuning improves the defect prediction performance of the chosen models in the F 1 scores. CodeBERT and CodeReviewer achieved a 10% and 12% increase in the F 1 score over the best baseline models, JITGNN and JITLine, when commit messages and code metrics are included. Our approach sheds more light on the abilities of language models in software engineering tasks, promoting their use in production environments and ensuring that deployed software is defect-free efficiently.

Improving Fine-tuning Pre-trained Models on Small Source Code Datasets Via Variational Information Bottleneck.

An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-Trained Code Models.

Fine-tuning large neural language models for biomedical natural language processing

Improved Visual Fine-tuning with Natural Language Supervision

VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

An Empirical Study on JIT Defect Prediction Based on BERT-style Model

Learning code summarization from a small and local dataset

Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models

On Inter-dataset Code Duplication and Data Leakage in Large Language Models

Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction

RoBERTa: A Robustly Optimized BERT Pretraining Approach

A Framework for Improving the Reliability of Black-box Variational Inference

On Calibration of Pre-trained Code Models

Prototypical Fine-Tuning: Towards Robust Performance under Varying Data Sizes

Improving Training of Deep Neural Networks Via Singular Value Bounding

Contrastive variational information bottleneck for aspect-based sentiment analysis

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

Better Language Models of Code through Self-Improvement

Sparse is Enough in Fine-tuning Pre-trained Large Language Models

Empirical Study on Transformer-based Techniques for Software Engineering

To Code, or Not To Code? Exploring Impact of Code in Pre-training