Abstract:Recent studies have revealed that GNNs are highly susceptible to multiple adversarial attacks. Among these, graph backdoor attacks pose one of the most prominent threats, where attackers cause models to misclassify by learning the backdoored features with injected triggers and modified target labels during the training phase. Based on the features of the triggers, these attacks can be categorized into out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks, triggers with notable differences from the clean sample feature distributions constitute OOD backdoor attacks, whereas the triggers in ID backdoor attacks are nearly identical to the clean sample feature distributions. Existing methods can successfully defend against OOD backdoor attacks by comparing the feature distribution of triggers and clean samples but fail to mitigate stealthy ID backdoor attacks. Due to the lack of proper supervision signals, the main task accuracy is negatively affected in defending against ID backdoor attacks. To bridge this gap, we propose DMGNN against OOD and ID graph backdoor attacks that can powerfully eliminate stealthiness to guarantee defense effectiveness and improve the model performance. Specifically, DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation. To further filter the diversity of generated explainable graphs and erase the influence of the trigger features, we present a reverse sampling pruning method to screen and discard the triggers directly on the data level. Extensive experimental evaluations on open graph datasets demonstrate that DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance (within 3.5%).

Explanatory subgraph attacks against Graph Neural Networks

Explainable Graph Neural Networks Under Fire

Devil in Disguise: Breaching Graph Neural Networks Privacy through Infiltration

Graph Neural Network Explanations are Fragile

Explainability-based Backdoor Attacks Against Graph Neural Networks

E-SAGE: Explainability-based Defense Against Backdoor Attacks on Graph Neural Networks

XGBD: Explanation-Guided Graph Backdoor Detection

Identifying Backdoored Graphs in Graph Neural Network Training: An Explanation-Based Approach with Novel Metrics

Cooperative Explanations of Graph Neural Networks.

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation

Coca: Improving and Explaining Graph Neural Network-Based Vulnerability Detection Systems

GNNExplainer: Generating Explanations for Graph Neural Networks

Explainable AI Security: Exploring Robustness of Graph Neural Networks to Adversarial Attacks

Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions

Unnoticeable Backdoor Attacks on Graph Neural Networks

Transferable Graph Backdoor Attack

DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks

Reinforcement Learning Enhanced Explainer for Graph Neural Networks

Reliable Graph Neural Network Explanations Through Adversarial Training

A backdoor attack against link prediction tasks with graph neural networks

"No Matter What You Do": Purifying GNN Models via Backdoor Unlearning