Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs
Xuanye Wang,Lu Lu,Zhanyu Yang,Qingyan Tian,Haisha Lin
DOI: https://doi.org/10.1007/s44196-024-00551-3
IF: 2.259
2024-06-20
International Journal of Computational Intelligence Systems
Abstract:Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA) 3 , a novel framework leveraging the pre-trained CodeT5+ and (IA) 3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA) 3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA) 3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA) 3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).
computer science, artificial intelligence, interdisciplinary applications