Abstract:Static Analysis Tools (SATs) are widely applied to detect defects in software projects. However, SATs are overshadowed by a large number of unactionable warnings, which severely hinder the usability of SATs. To address this problem, the existing approaches commonly use Machine Learning (ML) techniques for Actionable Warning Identification (AWI). For these ML-based AWI approaches, the warning feature determination is one of the most critical parts to effectively identify actionable warnings. To eliminate redundant and irrelevant warning features, ML-based AWI approaches usually incorporate feature selection to determine the feature subset by calculating the importance or correlation of features with warning labels. Nevertheless, warning labels are not always available directly in practice. Thus, it is vital and challenging to select warning features for ML-based AWI approaches when warning labels are absent.To address the above problem, we propose an UNsupervised fEAture SElection approach called UNEASE for ML-based AWI. (1) UNEASE first performs the feature clustering to gather warning features into clusters, where the number of clusters is automatically determined and features in the same cluster are considered redundant. (2) Subsequently, UNEASE performs the feature ranking to sort warning features in each cluster with three newly proposed ranking strategies and selects the top-ranked warning feature from each cluster. Based on the selected features, we train a ML classifier to identify actionable warnings. We conduct experiments in eight large-scale and real-world warning datasets. Comparing UNEASE with nine typical feature reduction techniques, the experimental results show that while taking the low cost to perform the feature selection and maintaining the low redundancy rate in the selected warning features, UNEASE obtains the top-ranked AUC.

AW4C: A Commit-Aware C Dataset for Actionable Warning Identification

A Large-Scale Empirical Study of Actionable Warning Distribution Within Projects

ACWRecommender: A Tool for Validating Actionable Warnings with Weak Supervision

Machine Learning for Actionable Warning Identification: A Comprehensive Survey

Learning to Recognize Actionable Static Code Warnings (is Intrinsically Easy)

How to Find Actionable Static Analysis Warnings: A Case Study with FindBugs

Pre-trained Model-based Actionable Warning Identification: A Feasibility Study

Improving actionable warning identification via the refined warning-inducing context representation

An Empirical Study of Class Rebalancing Methods for Actionable Warning Identification

Fine-grained Commit-level Vulnerability Type Prediction by CWE Tree Structure.

An Unsupervised Feature Selection Approach for Actionable Warning Identification.

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

Automatic Construction of an Effective Training Set for Prioritizing Static Analysis Warnings.

A Study of Static Warning Cascading Tools (Experience Paper)

Improving the detection of technical debt in Java source code with an enriched dataset

Automated Static Warning Identification via Path-based Semantic Representation

FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools

Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

DAppSCAN: Building Large-Scale Datasets for Smart Contract Weaknesses in DApp Projects

SATDAUG -- A Balanced and Augmented Dataset for Detecting Self-Admitted Technical Debt