Abstract:Explainability is crucial for probing graph neural networks (GNNs), answering questions like “Why the GNN model makes a certain prediction?”. Feature attribution is a prevalent technique of highlighting the explanatory subgraph in the input graph, which plausibly leads the GNN model to make its prediction. Various attribution methods have been proposed to exploit gradient-like or attention scores as the attributions of edges, then select the salient edges with top attribution scores as the explanation. However, most of these works make an untenable assumption — the selected edges are linearly independent — thus leaving the dependencies among edges largely unexplored, especially their coalition effect. We demonstrate unambiguous drawbacks of this assumption — making the explanatory subgraph unfaithful and verbose. To address this challenge, we propose a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer). It frames the explanation task as a sequential decision process — an explanatory subgraph is successively constructed by adding a salient edge to connect the previously selected subgraph. Technically, its policy network predicts the action of edge addition, and gets a reward that quantifies the action’s causal effect on the prediction. Such reward accounts for the dependency of the newly-added edge and the previously-added edges, thus reflecting whether they collaborate together and form a coalition to pursue better explanations. It is trained via policy gradient to optimize the reward stream of edge sequences. As such, RC-Explainer is able to generate faithful and concise explanations, and has a better generalization power to unseen graphs. When explaining different GNNs on three graph classification datasets, RC-Explainer achieves better or comparable performance to state-of-the-art approaches w.r.t. two quantitative metrics: predictive accuracy, contrastivity, and safely passes sanity checks and visual inspections. Codes and datasets are available at https://github.com/xiangwang1223/reinforced_causal_explainer .

Causal Feature Attribution: Towards a Trustworthy and Actionable Explanations of Deep Neural Network

Causality-Aware Local Interpretable Model-Agnostic Explanations

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning

Counterfactual Explanations of Black-box Machine Learning Models using Causal Discovery with Applications to Credit Rating

Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Causality in Neural Networks -- An Extended Abstract

From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?

Causal Learning and Explanation of Deep Neural Networks via Autoencoded Activations

Visual Interpretable and Explainable Deep Learning Models for Brain Tumor MRI and COVID-19 Chest X-ray Images

Reinforced Causal Explainer for Graph Neural Networks.

OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks

Explanatory causal effects for model agnostic explanations

Explaining Deep Learning Models using Causal Inference

Interpretable Deep Learning Models: Enhancing Transparency and Trustworthiness in Explainable AI

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Solving the enigma: Deriving optimal explanations of deep networks

On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations

A Trustworthy Counterfactual Explanation Method With Latent Space Smoothing

Gradient based Feature Attribution in Explainable AI: A Technical Review