Abstract:Vulnerability detectors based on deep learning (DL) models have proven their effectiveness in recent years. However, the shroud of opacity surrounding the decision-making process of these detectors makes it difficult for security analysts to comprehend. To address this, various explanation approaches have been proposed to explain the predictions by highlighting important features, which have been demonstrated effective in other domains such as computer vision and natural language processing. Unfortunately, an in-depth evaluation of vulnerability-critical features, such as fine-grained vulnerability-related code lines, learned and understood by these explanation approaches remains lacking. In this study, we first evaluate the performance of ten explanation approaches for vulnerability detectors based on graph and sequence representations, measured by two quantitative metrics including fidelity and vulnerability line coverage rate. Our results show that fidelity alone is not sufficient for evaluating these approaches, as fidelity incurs significant fluctuations across different datasets and detectors. We subsequently check the precision of the vulnerability-related code lines reported by the explanation approaches, and find poor accuracy in this task among all of them. This can be attributed to the inefficiency of explainers in selecting important features and the presence of irrelevant artifacts learned by DL-based detectors.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the interpretability and accuracy issues of deep - learning (DL) models in vulnerability detection. Specifically, although deep - learning - based vulnerability detectors have proven their effectiveness, the decision - making processes of these detectors are often opaque, making it difficult for security analysts to understand their prediction results. To address this problem, researchers have proposed various explanation methods to interpret prediction results and help understanding by highlighting important features. However, there is a lack of in - depth evaluation of these explanation methods in the field of vulnerability detection in existing research, especially in the identification of fine - grained vulnerability - related code lines. Therefore, this paper aims to: 1. **Evaluate the effectiveness of existing explanation methods**: The researchers evaluated the performance of ten explanation methods on graph - and sequence - based vulnerability detectors, using two quantitative metrics - fidelity and vulnerability line coverage rate. The study found that relying solely on fidelity for evaluation can lead to significant fluctuations between different datasets and detectors, and cannot accurately reflect the performance of explanation methods in locating critical vulnerability - related code lines. 2. **Reveal the limitations of existing explanation methods**: The study found that existing explanation methods perform poorly in precisely locating vulnerability - related code lines, even with a high fidelity score. This indicates that explanation methods have deficiencies in selecting important features and are easily influenced by irrelevant features. 3. **Propose new evaluation metrics**: To more comprehensively evaluate the effectiveness of explanation methods, the researchers introduced two new sub - metrics - vulnerability Triggering Line Coverage (TLC) and vulnerability Fixing Line Coverage (FLC) to measure the performance of explanation methods in identifying and explaining vulnerability - related code lines. 4. **Analyze the limitations of deep - learning detectors**: The study also revealed the limitations of deep - learning vulnerability detectors in learning vulnerability - triggering mechanisms. Although these detectors can distinguish between vulnerable and normal samples, they often fail to understand the actual triggering mechanisms of vulnerabilities and are easily influenced by irrelevant perturbations. ### Main contributions - **Enhance evaluation metrics for explanation methods**: New metrics such as TLC and FLC are introduced to more accurately evaluate the effectiveness of explanation methods in vulnerability location. - **Propose a new evaluation metric VUR**: Used to evaluate the sensitivity of deep - learning detectors to irrelevant perturbations. - **Reveal the limitations of deep - learning detectors**: Point out the deficiencies of current detectors in understanding and identifying vulnerability - triggering mechanisms. In summary, through systematic evaluation and analysis, this paper reveals the limitations of existing explanation methods and deep - learning vulnerability detectors, and proposes suggestions for improving evaluation metrics, with the aim of improving the accuracy and interpretability of vulnerability detection.

Beyond Fidelity: Explaining Vulnerability Localization of Learning-based Detectors

Beyond Fidelity: Explaining Vulnerability Localization of Learning-based Detectors

Function-Level Vulnerability Detection Through Fusing Multi-Modal Knowledge

Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching

Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation

The Vulnerability Is in the Details: Locating Fine-grained Information of Vulnerable Code Identified by Graph-based Detectors

Deep Learning based Vulnerability Detection: Are We There Yet?

Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (experience Paper).

Toward Improved Deep Learning-based Vulnerability Detection

Evaluating Explanation Methods for Deep Learning in Security

Towards Making Deep Learning-based Vulnerability Detectors Robust

Learning-based Models for Vulnerability Detection: An Extensive Study

FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis

Explaining the Contributing Factors for Vulnerability Detection in Machine Learning

A Comparative Study of Deep Learning-Based Vulnerability Detection System

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

Vulnerability detection based on federated learning

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation

VulMPFF: A Vulnerability Detection Method for Fusing Code Features in Multiple Perspectives

Fidelity - A Property of Deep Neural Networks to Measure the Trustworthiness of Prediction Results.

VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution