Abstract:Transformer interpretability research is a hot topic in the area of deep learning. Traditional interpretation methods mostly use the final layer output of the Transformer encoder as masks to generate an explanation map. However, These approaches overlook two crucial aspects. At the coarse-grained level, the mask may contain uncertain information, including unreliable and incomplete object location data; at the fine-grained level, there is information loss on the mask, resulting in spatial noise and detail loss. To address these issues, in this paper, we propose a two-stage coarse-to-fine strategy (C2F-Explainer) for improving Transformer interpretability. Specifically, we first design a sequential three-way mask (S3WM) module to handle the problem of uncertain information at the coarse-grained level. This module uses sequential three-way decisions to process the mask, preventing uncertain information on the mask from impacting the interpretation results, thus obtaining coarse-grained interpretation results with accurate position. Second, to further reduce the impact of information loss at the fine-grained level, we devised an attention fusion (AF) module inspired by the fact that self-attention can capture global semantic information, AF aggregates the attention matrix to generate a cross-layer relation matrix, which is then used to optimize detailed information on the interpretation results and produce fine-grained interpretation results with clear and complete edges. Experimental results show that the proposed C2F-Explainer has good interpretation results on both natural and medical image datasets, and the mIoU is improved by 2.08% on the PASCAL VOC 2012 dataset.

The Explainability of Transformers: Current Status and Directions

Explainability of Vision Transformers: A Comprehensive Review and New Perspectives

Better Explain Transformers by Illuminating Important Information

T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers

Combining Transformers with Natural Language Explanations

Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

Explaining How Transformers Use Context to Build Predictions

R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut

Multi-Layer Attention-Based Explainability via Transformers for Tabular Data

Transformer-based land use and land cover classification with explainability using satellite imagery

Explainability of Text Processing and Retrieval Methods: A Critical Survey

Attention Mechanisms Don't Learn Additive Models: Rethinking Feature Importance for Transformers

Attention Meets Post-hoc Interpretability: A Mathematical Perspective

C2F-Explainer: Explaining Transformers Better Through a Coarse-to-Fine Strategy

Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods

Transformers are Expressive, But Are They Expressive Enough for Regression?

Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Transformers in computational visual media: A survey

On the Faithfulness of Vision Transformer Explanations

From Understanding to Utilization: A Survey on Explainability for Large Language Models