Abstract:Predictions made by graph neural networks (GNNs) usually lack interpretability due to their complex computational behavior and the abstract nature of graphs. In an attempt to tackle this, many GNN explanation methods have emerged. Their goal is to explain a model's predictions and thereby obtain trust when GNN models are deployed in decision critical applications. Most GNN explanation methods work in a post-hoc manner and provide explanations in the form of a small subset of important edges and/or nodes. In this paper we demonstrate that these explanations can unfortunately not be trusted, as common GNN explanation methods turn out to be highly susceptible to adversarial perturbations. That is, even small perturbations of the original graph structure that preserve the model's predictions may yield drastically different explanations. This calls into question the trustworthiness and practical utility of post-hoc explanation methods for GNNs. To be able to attack GNN explanation models, we devise a novel attack method dubbed \textit{GXAttack}, the first \textit{optimization-based} adversarial white-box attack method for post-hoc GNN explanations under such settings. Due to the devastating effectiveness of our attack, we call for an adversarial evaluation of future GNN explainers to demonstrate their robustness. For reproducibility, our code is available via GitHub.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the vulnerability of graph neural network (GNN) interpretation methods in the face of adversarial attacks. Specifically, although many existing GNN interpretation methods can explain the prediction results of GNN models to a certain extent, these interpretation methods are easily affected by small adversarial perturbations. Even if these perturbations do not change the prediction results of the model, they may cause significant changes in the interpretation results. This makes the credibility and practicality of these interpretation methods in practical applications be questioned. The main contributions of the paper are as follows: 1. **Reveal the problem**: Through experiments, it shows the sensitivity of common GNN interpretation methods to adversarial perturbations, that is, even small perturbations to the original graph structure that keep the prediction results unchanged may lead to large changes in the interpretation results. 2. **Propose an attack method**: Design a new optimization - based adversarial white - box attack method GXAttack, which is specifically used to attack GNN interpretation methods. This method can generate carefully designed adversarial perturbations, which are not only small and imperceptible, but also can significantly change the interpretation results without changing the model prediction results. 3. **Evaluation method**: Through experiments on multiple datasets, the effectiveness of GXAttack is verified, and it is shown that other widely used GNN interpretation methods (such as GradCAM, GNNExplainer and SubgraphX) also show vulnerability in the face of the optimized attack. The paper emphasizes that because the attack effect of GXAttack is very significant, the authors call on future research to consider the influence of adversarial attacks when evaluating the robustness of GNN interpretation methods to ensure the reliability and credibility of these interpretation methods in practical applications.

Explainable Graph Neural Networks Under Fire

Graph Neural Network Explanations are Fragile

Jointly Attacking Graph Neural Network and its Explanations

Explanatory subgraph attacks against Graph Neural Networks

Reliable Graph Neural Network Explanations Through Adversarial Training

Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks

GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks

Evaluating Post-hoc Explanations for Graph Neural Networks Via Robustness Analysis.

Explainability-based Backdoor Attacks Against Graph Neural Networks

Explainability in Graph Neural Networks: An Experimental Survey

GNNExplainer: Generating Explanations for Graph Neural Networks

Robust explanations for graph neural network with neuron explanation component

Explainability in Graph Neural Networks: A Taxonomic Survey

Towards Faithful and Consistent Explanations for Graph Neural Networks

Towards Self-Explainable Graph Neural Network

Identifying Backdoored Graphs in Graph Neural Network Training: An Explanation-Based Approach with Novel Metrics

Cooperative Explanations of Graph Neural Networks.

E-SAGE: Explainability-based Defense Against Backdoor Attacks on Graph Neural Networks

GOAt: Explaining Graph Neural Networks via Graph Output Attribution

Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks