Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Xuanye Wang,Lu Lu,Zhanyu Yang,Qingyan Tian,Haisha Lin

DOI: https://doi.org/10.1007/s44196-024-00551-3

IF: 2.259

2024-06-20

International Journal of Computational Intelligence Systems

Abstract:Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA) 3 , a novel framework leveraging the pre-trained CodeT5+ and (IA) 3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA) 3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA) 3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA) 3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).

computer science, artificial intelligence, interdisciplinary applications

What problem does this paper attempt to address?

This paper proposes a solution to the problem of software defect detection (SDD), especially for multi-class defect detection. Traditional SDD methods rely on expert indicators or deep learning techniques, but they have limitations such as reliability, versatility, and high training costs. In this paper, the authors propose a new framework called MSDD-(IA)3, which uses pre-trained CodeT5+ and (IA)3 strategy to achieve parameter-efficient multi-class SDD. The main innovations of MSDD-(IA)3 include: 1. Using pre-trained CodeT5+ to build a detection model, generating code representations while capturing defect tendency features. 2. By integrating source code and expert indicators based on natural language, a new feature sequence is designed to enrich the input data. 3. To address the high costs of pre-trained language models, MSDD-(IA)3 injects (IA)3 vectors at specific layers and only updates these injected parameters to reduce training costs. 4. Supports multi-class detection of six common types of defects in Python software. Experimental results demonstrate that MSDD-(IA)3 outperforms existing SDD methods in metrics such as weighted F1, weighted recall, weighted precision, and Matthews correlation coefficient. Furthermore, its training parameters are only 0.04% of the original CodeT5+, reducing memory overhead.

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning

A Wafer Surface Defect Detection Method Built on Generic Object Detection Network

An Improved Semi-Supervised Learning Method for Software Defect Prediction.

TLEL: A Two-Layer Ensemble Learning Approach for Just-in-time Defect Prediction

Deep Learning for Just-In-Time Defect Prediction

S<SUP>2</SUP>LMMD: Cross-Project Software Defect Prediction via Statement Semantic Learning and Maximum Mean Discrepancy

A New Multiscale Multiattention Convolutional Neural Network for Fine-Grained Surface Defect Detection

Deep Semantic Feature Learning with Embedded Static Metrics for Software Defect Prediction

SDP-MTF: A Composite Transfer Learning and Feature Fusion for Cross-Project Software Defect Prediction

TL-SDD: A Transfer Learning-Based Method for Surface Defect Detection with Few Samples

Within-Project Defect Prediction

Optimized Deeplearning Algorithm for Software Defects Prediction

IMDAC: A robust intelligent software defect prediction model via multi‐objective optimization and end‐to‐end hybrid deep learning networks

Improved mayfly optimization deep stacked sparse auto encoder feature selection scorched gradient descent driven dropout XLM learning framework for software defect prediction

Multi-Objective Software Defect Prediction via Multi-Source Uncertain Information Fusion and Multi-Task Multi-View Learning

A Novel Class-Imbalance Learning Approach for Both Within-Project and Cross-Project Defect Prediction.

An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems

Software defect prediction based on nested-stacking and heterogeneous feature selection

Deep Just-In-Time Defect Localization

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models