Abstract:Software engineering workflows use version control systems to track changes and handle merge cases from multiple contributors. This has introduced challenges to testing because it is impractical to test whole codebases to ensure each change is defect-free, and it is not enough to test changed files alone. Just-in-time software defect prediction (JIT-SDP) systems have been proposed to solve this by predicting the likelihood that a code change is defective. Numerous techniques have been studied to build such JIT software defect prediction models, but the power of pre-trained code transformer language models in this task has been underexplored. These models have achieved human-level performance in code understanding and software engineering tasks. Inspired by that, we modeled the problem of change defect prediction as a text classification task utilizing these pre-trained models. We have investigated this idea on a recently published dataset, ApacheJIT, consisting of 44k commits. We concatenated the changed lines in each commit as one string and augmented it with the commit message and static code metrics. Parameter-efficient fine-tuning was performed for 4 chosen pre-trained models, JavaBERT, CodeBERT, CodeT5, and CodeReviewer, with either partially frozen layers or low-rank adaptation (LoRA). Additionally, experiments with the Local, Sparse, and Global (LSG) attention variants were conducted to handle long commits efficiently, which reduces memory consumption. As far as the authors are aware, this is the first investigation into the abilities of pre-trained code models to detect defective changes in the ApacheJIT dataset. Our results show that proper fine-tuning improves the defect prediction performance of the chosen models in the F 1 scores. CodeBERT and CodeReviewer achieved a 10% and 12% increase in the F 1 score over the best baseline models, JITGNN and JITLine, when commit messages and code metrics are included. Our approach sheds more light on the abilities of language models in software engineering tasks, promoting their use in production environments and ensuring that deployed software is defect-free efficiently.

Domain Adaptation for Code Model-based Unit Test Case Generation

CodeT: Code Generation with Generated Tests

Unit Test Case Generation with Transformers and Focal Context

Using Large Language Models to Generate JUnit Tests: An Empirical Study

CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information

AutoTest: Evolutionary Code Solution Selection with Test Cases

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

Test-Driven Multi-Task Learning with Functionally Equivalent Code Transformation for Neural Code Generation.

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

Generative AI for Test Driven Development: Preliminary Results

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark

Model-domain Failing Test Augmentation with Generative Adversarial Networks

Two Birds with One Stone: Boosting Code Generation and Code Search Via a Generative Adversarial Network

Test-time Adaptation in the Dynamic World with Compound Domain Knowledge Management

Automatic Unit Test Generation for Deep Learning Frameworks based on API Knowledge

CasModaTest: A Cascaded and Model-agnostic Self-directed Framework for Unit Test Generation

EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations

Improving Automated Program Repair with Domain Adaptation

Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction

Incorporating Domain Knowledge through Task Augmentation for Front-End JavaScript Code Generation

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing