Abstract:Class activation mapping (CAM) methods have achieved great model explainability performance for CNNs. However, these methods do not perform so well for Transformers, whose architectures are fundamentally different from CNNs. Instead, gradient-weighted attention visualization methods, with effective consideration for the self-attention and skip-connection, achieve very promising explainability for Transformers. These methods compute gradients by back-propagation to achieve class-specific and accurate explainability. In this work, to further increase the accuracy and efficiency in Transformer explainability, we propose a novel method which is both class-specific and gradient-free. The token importance is calculated using Shapley value method, which has a solid base on game theory but is conventionally very computational expensive to use in practice. To calculate the Shapley value accurately and efficiently for each token, we decouple the self-attention from the information flow in Transformers and freeze other unrelated values. In this way, we construct a linear version of Transformer so that the Shapley values can be calculated conveniently. Using Shapley values for explainability, our method not only improves the explainability further but also becomes class-specific without using gradients, surpassing other gradient-based methods in both accuracy and efficiency. Furthermore, we show that explainability methods for CNNs and Transformers can be bridged under the 1st-order Taylor expansion of our method, resulting in (1) a significant explainability improvement for a modified GradCAM method in Transformers and (2) new insights into understanding the existing gradient-based attention visualization methods. Extensive experiments show that our method is superior compared to state-of-the-arts methods. Our code will be made available.

Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers

A Pixel-Level Explainable Approach of Convolutional Neural Networks and Its Application

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods

C2F-Explainer: Explaining Transformers Better Through a Coarse-to-Fine Strategy

TAME: Attention Mechanism Based Feature Fusion for Generating Explanation Maps of Convolutional Neural Networks

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

Generic Attention-model Explainability by Weighted Relevance Accumulation

R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut

MACE: Model Agnostic Concept Extractor for Explaining Image Classification Networks

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Explaining deep multi-class time series classifiers

Solving the enigma: Deriving optimal explanations of deep networks

Improving Network Interpretability via Explanation Consistency Evaluation

Explainability of Speech Recognition Transformers Via Gradient-Based Attention Visualization

CNN-based explanation ensembling for dataset, representation and explanations evaluation

Efficient Shapley Values Calculation for Transformer Explainability

Multi-Layer Attention-Based Explainability via Transformers for Tabular Data

Human Guided Exploitation of Interpretable Attention Patterns in Summarization and Topic Segmentation

The Explainability of Transformers: Current Status and Directions

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention