Abstract:Deep neural networks (DNNs) can accurately decode task-related information from brain activations. However, because of the non-linearity of DNNs, it is generally difficult to explain how and why they assign certain behavioral tasks to given brain activations, either correctly or incorrectly. One of the promising approaches for explaining such a black-box system is counterfactual explanation. In this framework, the behavior of a black-box system is explained by comparing real data and realistic synthetic data that are specifically generated such that the black-box system outputs an unreal outcome. The explanation of the system's decision can be explained by directly comparing the real and synthetic data. Recently, by taking advantage of advances in DNN-based image-to-image translation, several studies successfully applied counterfactual explanation to image domains. In principle, the same approach could be used in functional magnetic resonance imaging (fMRI) data. Because fMRI datasets often contain multiple classes (e.g., multiple behavioral tasks), the image-to-image transformation applicable to counterfactual explanation needs to learn mapping among multiple classes simultaneously. Recently, a new generative neural network (StarGAN) that enables image-to-image transformation among multiple classes has been developed. By adapting StarGAN with some modifications, here, we introduce a novel generative DNN (counterfactual activation generator, CAG) that can provide counterfactual explanations for DNN-based classifiers of brain activations. Importantly, CAG can simultaneously handle image transformation among all the seven classes in a publicly available fMRI dataset. Thus, CAG could provide a counterfactual explanation of DNN-based multiclass classifiers of brain activations. Furthermore, iterative applications of CAG were able to enhance and extract subtle spatial brain activity patterns that affected the classifier's decisions. Together, these results demonstrate that the counterfactual explanation based on image-to-image transformation would be a promising approach to understand and extend the current application of DNNs in fMRI analyses.

Quantitative Attributions with Counterfactuals

DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations

Visual Explanations with Attributions and Counterfactuals on Time Series Classification

Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals

Generating Counterfactual Explanations with Natural Language

Better Understanding Differences in Attribution Methods via Systematic Evaluations

Explaining with Counter Visual Attributes and Examples

Quantitative and Qualitative Evaluation of Explainable Deep Learning Methods for Ophthalmic Diagnosis

Relevant Irrelevance: Generating Alterfactual Explanations for Image Classifiers

Integrated Gradient Correlation: a Dataset-wise Attribution Method

Precise Benchmarking of Explainable AI Attribution Methods

Counterfactual visual explanations

Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables

Illuminating Salient Contributions in Neuron Activation with Attribution Equilibrium

Counterfactual Explanation of Brain Activity Classifiers Using Image-To-Image Transfer by Generative Adversarial Network

Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis

Understanding contributing neurons via attribution visualization

Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

Less is More: Discovering Concise Network Explanations