Code Smell Detection Research Based on Pre-training and Stacking Models
Dongwen Zhang,Shuai Song,Yang Zhang,Haiyang Liu,Gaojie Shen
DOI: https://doi.org/10.1109/tla.2024.10375735
IF: 0.967
2024-01-01
IEEE Latin America Transactions
Abstract:Code smells detection primarily adopts heuristic-based, machine learning, and deep learning approaches, However, to enhance accuracy, most studies employ deep learning methods, but the value of traditional machine learning methods should not be underestimated. Additionally, existing code smells detection methods do not pay sufficient attention to the textual features in the code. To address this issue, this paper proposes a code smell detection method, SCSmell, which utilizes static analysis tools to extract structure features, then transforms the code into txt format using static analysis tools , and inputs it into the BERT pre-training model to extract textual features. The structure features are combined with the textual features to generate sample data and label code smells instances. The REFCV method is then used to filter important structure features. To deal with the issue of data imbalance, the Borderline-SMOTE method is used to generate positive sample data, and a three-layer Stacking model is ultimately employed to detect code smells. In our experiment, we select 44 large actual projects programs as the training and testing sets and conducted smell detection for four types of code smells: brain class, data class, God class, and brain method. The experimental results indicate that the SCSmell method improves the average accuracy by 10.38 % compared to existing detection methods, while maintaining high precision, recall, and F1 scores. The SCSmell method is an effective solution for implementing code smells detection.