Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning

Tianhang Zhang,Qingfeng Du,Jincheng Xu,Jiechu Li,Xiaojun Li
DOI: https://doi.org/10.1109/apsec51365.2020.00016
2020-01-01
Abstract:Software defect prediction (SDP) utilizes a trained prediction model to predict the defect proneness of code modules in a software system by mining the inherent characteristics of historical defect data. An effective model can optimize the allocation of testing resources, thus improving the quality of software products. Most previous studies use handcrafted features to represent code snippets, but the main problem is that it is difficult to capture the semantic and structural information of the code context, which is often crucial for software defect prediction. Meanwhile, most of the existing software defect prediction models cannot make predictions at the code line level, which makes it extremely arduous to provide developers with more detailed reference information. To address these issues, in this paper, we propose a model based on ensemble learning techniques and attention mechanisms to offer more comprehensive prediction information to developers by locating suspect lines of code when making method-level defect predictions. This model leverages abstract syntax trees (ASTs) as the intermediate representation of code snippets. Since the historical defect data has a striking characteristic of classimbalance, an approach based on Self-organizing Map (SOM) clustering is employed to handle noisy data. Experimental results show that, on average, the proposed model improves the F-measure by 17.7% and AUC by 37.8%, compared with the other four machine learning algorithms.
What problem does this paper attempt to address?