Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Xuanye Wang,Lu Lu,Zhanyu Yang,Qingyan Tian,Haisha Lin
DOI: https://doi.org/10.1007/s44196-024-00551-3
IF: 2.259
2024-06-20
International Journal of Computational Intelligence Systems
Abstract:Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA) 3 , a novel framework leveraging the pre-trained CodeT5+ and (IA) 3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA) 3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA) 3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA) 3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?
This paper proposes a solution to the problem of software defect detection (SDD), especially for multi-class defect detection. Traditional SDD methods rely on expert indicators or deep learning techniques, but they have limitations such as reliability, versatility, and high training costs. In this paper, the authors propose a new framework called MSDD-(IA)3, which uses pre-trained CodeT5+ and (IA)3 strategy to achieve parameter-efficient multi-class SDD. The main innovations of MSDD-(IA)3 include: 1. Using pre-trained CodeT5+ to build a detection model, generating code representations while capturing defect tendency features. 2. By integrating source code and expert indicators based on natural language, a new feature sequence is designed to enrich the input data. 3. To address the high costs of pre-trained language models, MSDD-(IA)3 injects (IA)3 vectors at specific layers and only updates these injected parameters to reduce training costs. 4. Supports multi-class detection of six common types of defects in Python software. Experimental results demonstrate that MSDD-(IA)3 outperforms existing SDD methods in metrics such as weighted F1, weighted recall, weighted precision, and Matthews correlation coefficient. Furthermore, its training parameters are only 0.04% of the original CodeT5+, reducing memory overhead.