An Unsupervised Feature Selection Approach for Actionable Warning Identification.

Xiuting Ge,Chunrong Fang,Jia Liu,Mingshuang Qing,Xuanye Li,Zhihong Zhao
DOI: https://doi.org/10.1016/j.eswa.2023.120152
IF: 8.5
2023-01-01
Expert Systems with Applications
Abstract:Static Analysis Tools (SATs) are widely applied to detect defects in software projects. However, SATs are overshadowed by a large number of unactionable warnings, which severely hinder the usability of SATs. To address this problem, the existing approaches commonly use Machine Learning (ML) techniques for Actionable Warning Identification (AWI). For these ML-based AWI approaches, the warning feature determination is one of the most critical parts to effectively identify actionable warnings. To eliminate redundant and irrelevant warning features, ML-based AWI approaches usually incorporate feature selection to determine the feature subset by calculating the importance or correlation of features with warning labels. Nevertheless, warning labels are not always available directly in practice. Thus, it is vital and challenging to select warning features for ML-based AWI approaches when warning labels are absent.To address the above problem, we propose an UNsupervised fEAture SElection approach called UNEASE for ML-based AWI. (1) UNEASE first performs the feature clustering to gather warning features into clusters, where the number of clusters is automatically determined and features in the same cluster are considered redundant. (2) Subsequently, UNEASE performs the feature ranking to sort warning features in each cluster with three newly proposed ranking strategies and selects the top-ranked warning feature from each cluster. Based on the selected features, we train a ML classifier to identify actionable warnings. We conduct experiments in eight large-scale and real-world warning datasets. Comparing UNEASE with nine typical feature reduction techniques, the experimental results show that while taking the low cost to perform the feature selection and maintaining the low redundancy rate in the selected warning features, UNEASE obtains the top-ranked AUC.
What problem does this paper attempt to address?